Open immahi79 opened 2 years ago
Hi Mahesh,
I'm not familiar with the AWS Prometheus workspace, not sure if that perhaps has some configuration that needs tweaking. The duplicate series error is a good lead of you can figure out what is the key it's comparing things with - is it metric name and all tags?
You shouldn't need to reset counters, in Prometheus they keep their values regardless of scraping, so it makes sense that they return those same values even without additional requests to the endpoints.
Hi
Thank you for the Prometheus Flask exporter and grafna dashboard exmple which was very helpful to ingest metrics from microservices running in the ECS cluster. I have followed the article https://aws.amazon.com/blogs/opensource/metrics-collection-from-amazon-ecs-using-amazon-managed-service-for-prometheus/ to deploy and scrape the metrics from each flask target and pushed them into the amazon Prometheus workspace.
I got the below errors when pushing the data collected from the Prometheus flask exporter.
ts=2022-05-18T19:54:33.370Z caller=dedupe.go:112 component=remote level=error remote_name=3f2135 url=http://localhost:8080/workspaces/xxxxx/api/v1/remote_write msg="non-recoverable error" count=500 err="server returned HTTP status 400 Bad Request: user=xxxxx: err: duplicate sample for timestamp. timestamp=2022-05-18T19:54:33.251Z, series={name=\"flask_http_request_duration_seconds_bucket\", cluster=\"test\", instance=\"10.0.x.xxx\", job=\"ecs_services\", le=\"0.01\", method=\"GET\", path=\"/api/test\", service=\"test-api\", status=\"201\", taskid=\"b71155a6004445f2900fb294ca382eec\"}"
I tried various approaches to change the scrape interval but that did not help.
I observed that average response request metrics showed incorrectly as well. The same sample is repeatedly shown across another timestamp as well even though that API was not called. Using rate should reset if the endpoint is not called but appears /metrics returns the same sample for the series flask_http_request_duration_seconds_sum, flask_http_request_duration_seconds_count every time /metrics is being scrapped thus it is showing an error duplicate sample for timestamp as well average response duration is showing incorrect.
Can you please help to understand the timestamp issue with the Prometheus flask exporter? Do I need to reset the counter explicitly after every /metrics call?
Thanks & regards Mahesh