lcrownover / prometheus-slurm-exporter

Prometheus exporter for the SLURM scheduler
GNU General Public License v3.0
8 stars 0 forks source link

Caching Proxy for Slurm REST API #22

Open chrisdaaz opened 1 day ago

chrisdaaz commented 1 day ago

Thank you for developing this! Since the Slurm REST API is required, do you recommend users setup a caching proxy for the Slurm REST API? The Slurm REST API documentation states:

Sites are strongly encouraged to setup a caching proxy between slurmrestd and clients to avoid having clients repeatedly call queries, causing usage to be higher than needed (and causing lock contention) on the controller.

We haven't used the Slurm REST API in our cluster before, so any setup advice from your experience would be appreciated!

lcrownover commented 1 day ago

Hey @chrisdaaz , thanks for pointing that out! If you expect users to hit slurmrestd, you probably want to set up a caching proxy because it does seem like queries to some APIs (jobs in particular) are pretty expensive, and you can't control what those crazy folks are doing 😉

This exporter, however, shouldn't require a proxy because you're in control of the scrape interval, and an interval of 30 seconds should be no sweat for slurm. In addition, this exporter is configured with an internal cache so it only makes one call to each of the required endpoints during a gather operation.

Feel free to ask if you have any other questions!

Edit: Thinking about this a little more, I suppose I should write something in the README about securing slurmrestd, as the schedmd docs are great but a maybe little too unprescriptive.

If you want to disable the users' ability to use slurmrestd at all, it looks like you can prevent users from creating tokens at all by setting AuthAltParameters=disable_token_creation in your slurm.conf. I assume administrators could still generate tokens, which you would use for authenticating the exporter.

Another option might be local firewall on the head nodes only allowing access to port 6820 (or your configured port, or the port if you're reverse proxying slurmrestd) from the system running the exporter.

Either way, I'm interested in what path you choose! I'd love to know how it goes 😊