prometheus / pushgateway

Push acceptor for ephemeral and batch jobs.
Apache License 2.0
3.02k stars 468 forks source link

Allow multiple groups in one PUT #686

Closed sebastianw closed 1 month ago

sebastianw commented 2 months ago

Feature request

Use case. Why is this important?

We're currently pushing a larger list of groups to the pushgateway via Ansible. Each grouping requires it's own PUT request which costs time in our automation pipeline

Bug Report

What did you do?

Push multiple groups of metrics to the pushgateway.

What did you expect to see?

A way to push them in one HTTP PUT request.

What did you see instead? Under which circumstances?

We need to make a lot of HTTP PUT requests.

Environment

beorn7 commented 2 months ago

The API is (broadly) RESTful, so the group is encoded in the URL. Pushing multiple groups is not possible in that way, so we needed a completely new API to fulfill your request.

My main point against doing that is that the Pushgateway itself is not designed to handle large amount of pushed metrics. So adding a whole new API to support a high-throughput use case for which the backend isn't designed seems a misfit.

A minor point is that offering two ways of doing the same thing always adds cognitive overhead for the user and maintenance overhead for the maintainer, so there needed to be a strong benefit to justify this overhead.

sebastianw commented 1 month ago

@beorn7 Can you elaborate what a "large amount" would be? We're currently pushing 597 metrics in 249 groups. Do you feel these are too many for the gateway? The main time being used right now is the time Ansible needs to do 249 loops. If this is to much effort to do on the Pushgateway side we'll look into making a custom Ansible module for it.

beorn7 commented 1 month ago

You mean you will have 597 * 249 = 148653 metrics on the PGW at the same time?

That sounds definitely on the high side (although it's probably something that a PGW running on a large enough machine can still handle with reasonable performance).

The prime use case is really "I am a daily backup job, I'm done and would like to report the number of backed-up records (maybe partitioned by success and failure)." So a handful of metrics, pushed once a day, and then there are maybe dozens of those jobs in an organization.

With 100k metrics pushed, I would also raise the question if this is maybe too much information to keep on a single-node application like the PGW, which has no HA story whatsoever and no good way of persisting its state in a better way than on local disk.

Have you considered creating a dedicated exporter for your use case? Or using remote-write to directly send the data to a Prometheus server or (even better) a distributed metrics backend supporting the Prometheus remote-write protocol?

sebastianw commented 1 month ago

No sorry, we have 597 metrics in total, organized in 249 groups. So we make 249 PUT requests which combined 597 metrics in them.

beorn7 commented 1 month ago

OK, that sounds much better.

But now I'm a bit surprised that this takes so long. 249 PUT requests shouldn't be such a big deal and just take a few seconds. Maybe there is some optimization potential per request (which could very well be in the PGW itself).

sebastianw commented 1 month ago

The slowness in our case comes completely from Ansible where loops are really slow when you have a lot of variables. It would've been easiest if we could combine it in one PUT request but as it seems a lot of work for a special use case. I think we'll look at creating a custom ansible module that is pure Python and probably much faster.

beorn7 commented 1 month ago

OK, thanks for your understanding. I'll close this issue then.