korfuri / django-prometheus

Export Django monitoring metrics for Prometheus.io
Apache License 2.0
1.44k stars 244 forks source link

Fix use multi process for load metrics #429

Open saeedAdelpour opened 1 month ago

saeedAdelpour commented 1 month ago

Hi,

I opened a pull request (https://github.com/korfuri/django-prometheus/pull/407) that was closed. I think I need to explain the problem in more detail.

The first problem is with using the PROMETHEUS_METRICS_EXPORT_PORT and PROMETHEUS_METRICS_EXPORT_ADDRESS environment variables. Here is the command that starts my app:

gunicorn APP.wsgi:application --name APP --workers 4 --bind=0.0.0.0:8000 --log-level=INFO --log-file=- 

The first error we encounter is Address already in use because in exports.py, line 45, the function prometheus_client.start_http_server(port, addr=addr) is called without checking if the port is already in use. This should be handled with a try-except block.

The second problem is when we use the PROMETHEUS_MULTIPROC_DIR environment variable. The function SetupPrometheusEndpointOnPort does not check PROMETHEUS_MULTIPROC_DIR and only runs this:

prometheus_client.start_http_server(port, addr=addr)

The implementation of this part must be similar to ExportToDjangoView. I mistakenly forced the app to use PROMETHEUS_MULTIPROC_DIR, but I have fixed it.

Thank you for your attention. Please tell me where I am wrong about it.

asherf commented 1 month ago

TBH, I don't think this PR is a good idea. the function you modified is (IMHO, since I didn't originally write this code) is more of a reference code. If needs be, you can have this kind of code and customization on the app side. I think adding complexity to this library is the wrong choice.

swallowing exception in this way, is really bad idea.

saeedAdelpour commented 1 month ago

Hi,

Thank you for your response.

I agree with handling the exception this way, so I removed the try-except block to ensure only one process runs this part of the code.

However, I'm sure this code:

prometheus_client.start_http_server(port, addr=addr)

won't work with PROMETHEUS_MULTIPROC_DIR because when we try to run an HTTP server, we need to pass:

registry = prometheus_client.CollectorRegistry()

to start_http_server, and the code should be like this:

prometheus_client.start_http_server(port, addr=addr, registry=registry)

When we set PROMETHEUS_MULTIPROC_DIR in our app, all our metrics are stored in that directory, and the current code does not use that directory.

Here are the steps to reproduce the bug:

  1. Set the PROMETHEUS_MULTIPROC_DIR environment variable.
  2. Run a Django app with settings that do not include PROMETHEUS_METRICS_EXPORT_PORT and PROMETHEUS_METRICS_EXPORT_ADDRESS. The command is:
    gunicorn --bind=0.0.0.0:8001 proj.wsgi:application -w 4 --reload

    Note that we use 4 processes to run our main app.

  3. Create another settings file (/path/to/settings_prometheus.py) and add PROMETHEUS_METRICS_EXPORT_PORT and PROMETHEUS_METRICS_EXPORT_ADDRESS. The new settings file code is:

    from .settings import *
    
    PROMETHEUS_METRICS_EXPORT_PORT = 8002
    PROMETHEUS_METRICS_EXPORT_ADDRESS = "localhost"

    The command is:

    DJANGO_SETTINGS_MODULE=/path/to/settings_prometheus.py gunicorn --bind=0.0.0.0:8003 proj.wsgi:application -w 1 --reload

    Note that we use 1 process because the code:

    prometheus_client.start_http_server(port, addr=addr, registry=registry)

    must be run by only one process.

  4. We saw different results at http://localhost:8000/prometheus/metrics/ and http://localhost:8001.

Thank you for your attention. Please tell me if I am wrong about anything.

saeedAdelpour commented 1 month ago

Hi,

I hope you’re doing well. I added a pull request 3 weeks ago. Could you spare some moment to review it? I’d really appreciate it. @asherf