Running with gunicorn does not call my function (metrics.info)

flaker commented 3 years ago

Hi,

I want to add this metric that represents the age of files my application is working with. For that I just stat the known file and substract from time.time() (code is below)

As a flask application, I see no problem, on each call, the metric "model_age" will increment, as expected. As soon as I move to gunicorn, I see the metric show up. But it seems that the first value it has, is the value it will always show. E.g:

# HELP model_age Multiprocess metric
# TYPE model_age gauge
model_age{pid="30"} 0.0
model_age{pid="6"} 1.0
model_age{pid="32"} 0.0

For completeness, my code below. I might be completely wrong on my approach, but I think that in that case, the flask and gunicorn setups should behave in the same way (?).

app = Flask(__name__)
metrics = GunicornPrometheusMetrics(app)

def _get_data_age():
    return_value = 0.0
    try:
        full_path = os.path.join(MODEL_STORAGE_DIR, MODEL_SYMLINK, HISTORY_FILE_NAME)
        stat = os.stat(full_path)
        return_value = (time.time() - stat.st_mtime) / 60
    except FileNotFoundError:   # might happen during start up
        pass
    return return_value

age = metrics.info('model_age', 'Age of the running model in minutes.')
age.set_function(_get_data_age)

Thank you!!

rycus86 commented 3 years ago

Hi @flaker

Do you know if the prometheus_client library handles this properly for Gauge (which is backing our info here) ? I see that it supports different multiprocess modes and such and looking at the README in https://github.com/prometheus/client_python it sounds like there may be some extra setup to get this working properly?

Metrics tuning (Gauge):

When Gauge metrics are used, additional tuning needs to be performed. Gauges have several modes they can run in, which can be selected with the multiprocess_mode parameter.

'all': Default. Return a timeseries per process alive or dead. 'liveall': Return a timeseries per process that is still alive. 'livesum': Return a single timeseries that is the sum of the values of alive processes. 'max': Return a single timeseries that is the maximum of the values of all processes, alive or dead. 'min': Return a single timeseries that is the minimum of the values of all processes, alive or dead.

flaker commented 3 years ago

ah, I think my description of the issue is not clear. The problem is that

# HELP model_age Multiprocess metric
# TYPE model_age gauge
model_age{pid="30"} 0.0
model_age{pid="6"} 1.0
model_age{pid="32"} 0.0

stays like that forever. The callback is never triggered or the value never updated

rycus86 commented 3 years ago

Yeah, I thought so, but I don't know if it's a problem woth this library, or with the underlying Gauge in multiprocessing mode. Any chance you could try just creating a Gauge directly amd check if the function is getting called as expected?

flaker commented 3 years ago

sure. I will try and get back to you.

rycus86 / prometheus_flask_exporter

Running with gunicorn does not call my function (metrics.info) #102