prometheus / client_python

Prometheus instrumentation library for Python applications
Apache License 2.0
3.97k stars 797 forks source link

Histogram without label creates histogram_*.db even without data #1022

Open JinLisek opened 7 months ago

JinLisek commented 7 months ago

I tried this with versions: 0.19.0 and 0.20.0, both seem to have this bug.

I have a script prom_example.py:

from prometheus_client import Histogram

Example = Histogram("example", "Example", "lol")

When I run it:

prometheus_multiproc_dir=/tmp/metrics python prom_example.py

I see nothing in /tmp/metrics (as expected).

But when I edit the script and remove the label from histogram:

from prometheus_client import Histogram

Example = Histogram("example", "Example")

Database is created is /tmp/metrics, named for example: histogram_37373.db I would expect no database creation, since no data is pushed into the metric.

Edit: it seems the behaviour is the same for Counter and Gauge

csmarchbanks commented 7 months ago

Thank you for opening this discussion!

I believe the current behavior is correct, specifically there should be a histogram with 0 values for all of the buckets/sum/count in the file. This is to allow a user to initialize the histogram at service startup before any requests are received which allows continuous 0 values in graphs instead of missing data. See https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics for more information.

JinLisek commented 6 months ago

Hmm... It's still weird to me why the behaviour is different when there are labels vs no labels.

My main issue is with this: Cron job runs every minute, it imports (indirectly) the metric, without using it. The disk is then flooded with empty databases... After adding a label the problem disappears.

But I can imagine, someone in the future (not knowing this tricky behaviour) creating a new metric without a label and the problem comes back. It's difficult to keep it from happening this way.

I don't see a good solution though. :(

csmarchbanks commented 6 months ago

When no labels are specified the client already knows to create the metric without someone needing to call .labels() on it so it automatically initializes it to zero to avoid the missing metrics issue.

From a Cron job do you need to be exporting metrics via multiprocess mode? It might make more sense to use something like the pushgateway and then have the metrics in memory only.