jacksontj / promxy

An aggregating proxy to enable HA prometheus
MIT License
1.14k stars 129 forks source link

Configurations with LB using promxy #56

Closed jakirpatel closed 6 years ago

jakirpatel commented 6 years ago

Hello ,

I am trying to use promxy. Currently I have 3 Nodes of prometheus servers with same configuration to achieve the replication of the data.

On top of that I have configured Nginx for LB. So how I can use promxy with this setup? What configurations (endpoints) I need to modify in Grafana ?

jacksontj commented 6 years ago

How exactly you want to setup the hosts (for LB etc.) is up to you. These sorts of topology questions can definitely go on and on forever, so I'll attempt to be brief-- but my apologies in advance if I ramble a bit (hopefully its useful rambling).

The 3 nodes you have would be considered a ServerGroup. So you'd need to configure a promxy server (or servers) with a servergroup consisting of those 3 nodes (promxy supports all the same discovery mechanisms as prometheus -- here is an example of static). Once that is configured you need grafana's requests to go to promxy.

So an example setup could be:

NGINX -> promxy (xN of them) -> 3 prometheus (through the servergroup configuration).

The NGINX in this setup isn't actually providing any real value to the prom API, so unless you are doing something else there (TLS termination, etc.) then you could simplify further and have promxy be what grafana talks to (read: no NGINX required).

The way I have set this up in the past is:

ELB -> pool-of-promxy -> N ServerGroups.

Hopefully that clarifies it-- if not, or if there are other questions, let me know! :smile:

jakirpatel commented 6 years ago

@jacksontj

Thank you for your answer. Actually I was not clear with the promxy endpoint. Is it running on specific port ?

It will be really helpful if this details will be in documentation. I just run the promxy on one of server but I dont know what to do next ?

jacksontj commented 6 years ago

Ah, promxy defaults to port 8082, but it can be overriden using the --bind-addr flag from the CLI. The majority of options are in the config file (which has comments and examples) the rest can be found in the CLI help page:

[jacksontj@localhost promxy]$ ./promxy -h
Usage:
  promxy [OPTIONS]

Application Options:
      --bind-addr=                                address for promxy to listen
                                                  on (default: :8082)
      --config=                                   path to the config file
      --log-level=                                Log level (default: info)
      --web.external-url=                         The URL under which
                                                  Prometheus is externally
                                                  reachable (for example, if
                                                  Prometheus is served via a
                                                  reverse proxy). Used for
                                                  generating relative and
                                                  absolute links back to
                                                  Prometheus itself. If the URL
                                                  has a path portion, it will
                                                  be used to prefix all HTTP
                                                  endpoints served by
                                                  Prometheus. If omitted,
                                                  relevant URL components will
                                                  be derived automatically.
      --query.timeout=                            Maximum time a query may take
                                                  before being aborted.
                                                  (default: 2m)
      --query.max-concurrency=                    Maximum number of queries
                                                  executed concurrently.
                                                  (default: 1000)
      --alertmanager.notification-queue-capacity= The capacity of the queue for
                                                  pending alert manager
                                                  notifications. (default:
                                                  10000)

Help Options:
  -h, --help                                      Show this help message
jakirpatel commented 6 years ago

@jacksontj

Thanks for your reply.

As in documentation, you say promxy is merging the data by filling the gap. Let's say if I have two Prometheus server's with the same configuration and both servers are scrapping after 15s. There should be differences in data points with respect to time (As both servers started different time).

If you merge the data from these servers, then I think it will be duplicated data? How to deal with a duplicated data with respect to different time-interval?

Could you explain little more about promxy architecture and how it's ensuring the correctness of data? Also how the queries get executed? Is through promxy? How can I sure the best performance with promxy architecture?

jacksontj commented 6 years ago

This is done using the anti_affinity configuration for the server group. This defines the size a gap must be for promxy to "fill" it, otherwise it'll simply return one of the series. This way if there is a gap it will be filled, otherwise promxy will simply return the series un-modified.

There should be differences in data points with respect to time (As both servers started different time).

One thing thats interesting, there are actually more problems than just that! Prometheus stores the time when the scrape completed, so depending on latency of the target you are scraping the time can drift even more!

TLDR; The use of the anti_affinity feature resolves this issue.

jacksontj commented 6 years ago

@jakirpatel Wanted to check in to make sure your question was answered. If so we can close out the issue, if not I can hopefully clarify any further questions.

jacksontj commented 6 years ago

Closing this issue out for now as it seems that the questions are answered. If they aren't feel free to re-open the issue.