korfuri / django-prometheus

Export Django monitoring metrics for Prometheus.io
Apache License 2.0
1.47k stars 245 forks source link

Celery Worker Questions #339

Open Routhinator opened 1 year ago

Routhinator commented 1 year ago

This module tracks my db metrics and cache metrics on my main multi-worker process great; however since I do a lot of work on my celery workers asynchronously - I'm missing a lot of visibility into those interactions.

While there are several exporters that export celery task metrics, none of them tie into this module or make this modules metrics accessible from their exporters.

I get some visibility into the primary init thread when Celery loads Django as it opens the default port of 8001 with django-prometheus, but when celery forks into the worker state, that stops being updated and no other ports are created on the workers threads.

My main question then, is how can I make this better? I'd like to avoid having to bring in another exporter to the codebase to get app metrics out of the workers, while at the same time I'd like to see the celery metrics for task succes/fails/durations as well.

Is just missing features or missing documentation on ways to approach these problems?

andrew-cybsafe commented 1 year ago

The metrics are being collected in each worker process, but the default server started by django-prometheus isn't setup to collect from multiple processes. Instead you can start it yourself with something like the following:

from pathlib import Path
from celery import signals
from prometheus_client import CollectorRegistry, multiprocess, start_http_server

@signals.worker_ready.connect()
def setup_prometheus(**kwargs):
    multiproc_folder_path = _setup_multiproc_folder()
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry, path=multiproc_folder_path)
    start_http_server(8000, registry=registry)

def _setup_multiproc_folder():
    coordination_dir = Path(os.environ["PROMETHEUS_MULTIPROC_DIR"])
    coordination_dir.mkdir(parents=True, exist_ok=True)
    for filepath in coordination_dir.glob("*.db"):
        filepath.unlink()
    return coordination_dir