trallnag / prometheus-fastapi-instrumentator

Instrument your FastAPI with Prometheus metrics.
ISC License
929 stars 83 forks source link

Metrics disappear when setting PROMETHEUS_MULTIPROC_DIR #282

Open lucasalvarezlacasa opened 7 months ago

lucasalvarezlacasa commented 7 months ago

I'm serving my FastApi application using more than one worker. For this, I had to set PROMETHEUS_MULTIPROC_DIR and make sure it points to a proper directory, so that all workers can read/write metrics there.

However, I noticed that the resulting /metrics endpoint exposes way less metrics than if I don't do this. For instance, metrics related to the garbage collector, the process information (CPU, GPU utilization), python version, etc, are not exposed anymore. All I see now are metrics related to the HTTP requests and responses.

Any ideas why? Am I doing something wrong?

These are the metrics I get when running with more than one worker:

# HELP http_request_size_bytes Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes summary
http_request_size_bytes_count{handler="/metrics"} 59.0
http_request_size_bytes_sum{handler="/metrics"} 404.0
http_request_size_bytes_count{handler="/status"} 7.0
http_request_size_bytes_sum{handler="/status"} 707.0
# HELP http_response_size_bytes Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes summary
http_response_size_bytes_count{handler="/metrics"} 59.0
http_response_size_bytes_sum{handler="/metrics"} 223520.0
http_response_size_bytes_count{handler="/status"} 7.0
http_response_size_bytes_sum{handler="/status"} 14.0
# HELP http_request_duration_highr_seconds Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds histogram
http_request_duration_highr_seconds_sum 0.35216000000000003
http_request_duration_highr_seconds_bucket{le="0.01"} 54.0
http_request_duration_highr_seconds_bucket{le="0.025"} 66.0
http_request_duration_highr_seconds_bucket{le="0.05"} 66.0
http_request_duration_highr_seconds_bucket{le="0.075"} 66.0
http_request_duration_highr_seconds_bucket{le="0.1"} 66.0
http_request_duration_highr_seconds_bucket{le="0.25"} 66.0
http_request_duration_highr_seconds_bucket{le="0.5"} 66.0
http_request_duration_highr_seconds_bucket{le="0.75"} 66.0
http_request_duration_highr_seconds_bucket{le="1.0"} 66.0
http_request_duration_highr_seconds_bucket{le="1.5"} 66.0
http_request_duration_highr_seconds_bucket{le="2.0"} 66.0
http_request_duration_highr_seconds_bucket{le="2.5"} 66.0
http_request_duration_highr_seconds_bucket{le="3.0"} 66.0
http_request_duration_highr_seconds_bucket{le="3.5"} 66.0
http_request_duration_highr_seconds_bucket{le="4.0"} 66.0
http_request_duration_highr_seconds_bucket{le="4.5"} 66.0
http_request_duration_highr_seconds_bucket{le="5.0"} 66.0
http_request_duration_highr_seconds_bucket{le="7.5"} 66.0
http_request_duration_highr_seconds_bucket{le="10.0"} 66.0
http_request_duration_highr_seconds_bucket{le="30.0"} 66.0
http_request_duration_highr_seconds_bucket{le="60.0"} 66.0
http_request_duration_highr_seconds_bucket{le="+Inf"} 66.0
http_request_duration_highr_seconds_count 66.0
# HELP http_request_duration_seconds Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{handler="/metrics",method="GET"} 0.34132999999999997
http_request_duration_seconds_sum{handler="/status",method="GET"} 0.01083
http_request_duration_seconds_bucket{handler="/metrics",le="0.1",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="0.5",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="1.0",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="+Inf",method="GET"} 59.0
http_request_duration_seconds_count{handler="/metrics",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/status",le="0.1",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="0.5",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="1.0",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="+Inf",method="GET"} 7.0
http_request_duration_seconds_count{handler="/status",method="GET"} 7.0
# HELP nlu_call_times_total Number of times the NLU has been called
# TYPE nlu_call_times_total counter
nlu_call_times_total 0.0
# HELP http_requests_total Total number of requests by method, status and handler.
# TYPE http_requests_total counter
http_requests_total{handler="/metrics",method="GET",status="2xx"} 59.0
http_requests_total{handler="/status",method="GET",status="2xx"} 7.0

These are the ones I get when running only with one worker (and thus, not using PROMETHEUS_MULTIPROC_DIR):

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 29056.0
python_gc_objects_collected_total{generation="1"} 20925.0
python_gc_objects_collected_total{generation="2"} 3496.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 460.0
python_gc_collections_total{generation="1"} 41.0
python_gc_collections_total{generation="2"} 3.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="11",patchlevel="0",version="3.11.0"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.09629184e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.06852352e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.70540120624e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1.4400000000000002
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 17.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP nlu_call_times_total Number of times the NLU has been called
# TYPE nlu_call_times_total counter
nlu_call_times_total 0.0
# HELP nlu_call_times_created Number of times the NLU has been called
# TYPE nlu_call_times_created gauge
nlu_call_times_created 1.7054012078114848e+09
# HELP http_requests_total Total number of requests by method, status and handler.
# TYPE http_requests_total counter
http_requests_total{handler="/metrics",method="GET",status="2xx"} 4.0
http_requests_total{handler="/status",method="GET",status="2xx"} 3.0
# HELP http_requests_created Total number of requests by method, status and handler.
# TYPE http_requests_created gauge
http_requests_created{handler="/metrics",method="GET",status="2xx"} 1.7054012132842155e+09
http_requests_created{handler="/status",method="GET",status="2xx"} 1.7054012154598083e+09
# HELP http_request_size_bytes Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes summary
http_request_size_bytes_count{handler="/metrics"} 4.0
http_request_size_bytes_sum{handler="/metrics"} 303.0
http_request_size_bytes_count{handler="/status"} 3.0
http_request_size_bytes_sum{handler="/status"} 303.0
# HELP http_request_size_bytes_created Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes_created gauge
http_request_size_bytes_created{handler="/metrics"} 1.7054012132842422e+09
http_request_size_bytes_created{handler="/status"} 1.7054012154598227e+09
# HELP http_response_size_bytes Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes summary
http_response_size_bytes_count{handler="/metrics"} 4.0
http_response_size_bytes_sum{handler="/metrics"} 26713.0
http_response_size_bytes_count{handler="/status"} 3.0
http_response_size_bytes_sum{handler="/status"} 6.0
# HELP http_response_size_bytes_created Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes_created gauge
http_response_size_bytes_created{handler="/metrics"} 1.70540121328427e+09
http_response_size_bytes_created{handler="/status"} 1.7054012154598377e+09
# HELP http_request_duration_highr_seconds Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds histogram
http_request_duration_highr_seconds_bucket{le="0.01"} 7.0
http_request_duration_highr_seconds_bucket{le="0.025"} 7.0
http_request_duration_highr_seconds_bucket{le="0.05"} 7.0
http_request_duration_highr_seconds_bucket{le="0.075"} 7.0
http_request_duration_highr_seconds_bucket{le="0.1"} 7.0
http_request_duration_highr_seconds_bucket{le="0.25"} 7.0
http_request_duration_highr_seconds_bucket{le="0.5"} 7.0
http_request_duration_highr_seconds_bucket{le="0.75"} 7.0
http_request_duration_highr_seconds_bucket{le="1.0"} 7.0
http_request_duration_highr_seconds_bucket{le="1.5"} 7.0
http_request_duration_highr_seconds_bucket{le="2.0"} 7.0
http_request_duration_highr_seconds_bucket{le="2.5"} 7.0
http_request_duration_highr_seconds_bucket{le="3.0"} 7.0
http_request_duration_highr_seconds_bucket{le="3.5"} 7.0
http_request_duration_highr_seconds_bucket{le="4.0"} 7.0
http_request_duration_highr_seconds_bucket{le="4.5"} 7.0
http_request_duration_highr_seconds_bucket{le="5.0"} 7.0
http_request_duration_highr_seconds_bucket{le="7.5"} 7.0
http_request_duration_highr_seconds_bucket{le="10.0"} 7.0
http_request_duration_highr_seconds_bucket{le="30.0"} 7.0
http_request_duration_highr_seconds_bucket{le="60.0"} 7.0
http_request_duration_highr_seconds_bucket{le="+Inf"} 7.0
http_request_duration_highr_seconds_count 7.0
http_request_duration_highr_seconds_sum 0.023
# HELP http_request_duration_highr_seconds_created Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds_created gauge
http_request_duration_highr_seconds_created 1.7054012078115673e+09
# HELP http_request_duration_seconds Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="/metrics",le="0.1",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="0.5",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="1.0",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="+Inf",method="GET"} 4.0
http_request_duration_seconds_count{handler="/metrics",method="GET"} 4.0
http_request_duration_seconds_sum{handler="/metrics",method="GET"} 0.01729
http_request_duration_seconds_bucket{handler="/status",le="0.1",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="0.5",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="1.0",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="+Inf",method="GET"} 3.0
http_request_duration_seconds_count{handler="/status",method="GET"} 3.0
http_request_duration_seconds_sum{handler="/status",method="GET"} 0.00571
# HELP http_request_duration_seconds_created Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds_created gauge
http_request_duration_seconds_created{handler="/metrics",method="GET"} 1.7054012132843099e+09
http_request_duration_seconds_created{handler="/status",method="GET"} 1.7054012154598625e+09

This is the code I'm using to register the instrumentator:

def register_instrumentator(app: FastAPI) -> None:
    """Registers the instrumentator into the application"""
    settings: Settings = get_settings()
    instrumentator: Instrumentator = Instrumentator(
        should_round_latency_decimals=settings.METRICS_SHOULD_ROUND_LATENCY_DECIMALS,
        round_latency_decimals=settings.METRICS_LATENCY_DECIMALS,
        excluded_handlers=settings.METRICS_EXCLUDE_HANDLERS,
        should_respect_env_var=True,
    )
    instrumentator.add(metrics.default())  # this is needed to have all default + custom metrics
    instrumentator.instrument(app=app).expose(app=app, endpoint=settings.METRICS_ENDPOINT)
angel18megha commented 5 months ago

Any updates, I am facing the same issue as well.

Zwujun commented 1 week ago

How did you set the environment variable? If you added it in the code using os.environ[key] = val, then it needs to be placed before importing PrometheusFastApiInstrumentator.