rycus86 / prometheus_flask_exporter

Prometheus exporter for Flask applications
https://pypi.python.org/pypi/prometheus-flask-exporter
MIT License
645 stars 161 forks source link

With `group_by='url_rule'` all unhandled URLs become `None` in metrics #156

Closed glebignatieff closed 1 year ago

glebignatieff commented 1 year ago

Summary

When instantiating PrometheusMetrics with group_by='url_rule' it gives url_rule="None" in metrics when requesting unhandled URLs.

Steps to reproduce

  1. Slighly modify examples/flask-run-135/app.py:
    - metrics = PrometheusMetrics(app)
    + metrics = PrometheusMetrics(app, group_by='url_rule') 
  2. Prepare envs:
    export DEBUG_METRICS=True
  3. Run the app (assuming that dependencies are installed):
    python3 app.py
  4. Request an unhandled URL:
    curl localhost:5000/
  5. Get the metrics:
    curl localhost:5000/metrics

Expected result

flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",status="404",url_rule="/"} 1.0

Actual result

flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",status="404",url_rule="None"} 1.0
rycus86 commented 1 year ago

Hey @glebignatieff Is that the expected behavior? Since the url didn't match any registered rules, the None there makes sense to me. We probably also don't want a new timeseries for every single URL someone (or a bot) tries to blow out the metric cardinality for those 404 responses.

glebignatieff commented 1 year ago

@rycus86

Thanks for the quick response! Can I filter out such entries all at once with excluded_paths? What's the point of storing the metrics of "None" URLs in the first place?

rycus86 commented 1 year ago

I'll have to check the code but I think that parameter only filters paths, not url rules. I think those metrics need to go somewhere anyway, so in my opinion None is as good a value as any there.

rycus86 commented 1 year ago

OK, checked it now:

            if self.excluded_paths:
                if any(pattern.match(request.path) for pattern in self.excluded_paths):
                    return response

So we always match those (regular expressions) against the request paths, I'd expect if you'd filter out ^/$ there then it would not show up on the metrics, but it could quickly turn into a whack-a-mole when someone requests /notfound1, /notfound2, etc.

glebignatieff commented 1 year ago

We probably also don't want a new timeseries for every single URL someone (or a bot) tries to blow out the metric cardinality for those 404 responses.

I marked this as an expected behavior, only based on my experience of working with Micrometer, which does exactly that – it gives a bunch of entries with 404 URLs.

I guess your solution has a point, so I'll just try to filter such entries on the upper level. Thanks for the explanation!