Open abhinavDhulipala opened 2 years ago
We have definitely less time than in the past to contribute and integrate the latest PRs submitted.
Since it may be useful as reference for further requests, I will highlight the major issues here:
squeue
, sinfo
, etc.). Even when regular expressions are used, it is not the first time that we were forced to amend the code and introduce corrections to deal with it. As more people asks for more features, I am not so sure this approach will work best.I am planning to do a latest round of PRs integration in the coming weeks but I am expecting that sooner or later some of the forks will be further ahead of us.
I see. That explains a lot. We recently had to modify the exporter for our purposes and found it more cumbersome then we would've liked. Especially since there is both a C and REST API. I have a couple thoughts on how the code could be restructured to make adding and removing a subset of features far more modular. Came to the same conclusion about tests as well.
Thank you for your contributions. Maintaining a open-source project is never easy and I appreciate all the hard-work. Completely understand that no one can maintain a library forever and priorities change.
In the meantime, if anyone has a fork that has already incorporated the changes above, I'd love to take a look. If we end up maintaining our own version we will pin the fork here.
I hope to take on the challenge of converting this exporter to use the REST API early/mid next year. If someone else gets to it, happy to use their implementation, otherwise I'll follow up here when it's in a workable state. I don't plan to add backwards-compatibility, as I will be writing it against the newest slurm version.
any updates on this as we have just added slrum to our arsenal and would love the amazing overview by the dashboards this would allow is to make in grafana
I implemented an exporter that implements most of the features of this exporter. No GPU or scheduler stats as our company has no use for them, but we implement pretty much everything else. We plan on open-sourcing it in the next week.
sounds interesting!
Hi guys, we are actively maintaining a JSON-based, hopefully, more maintainable/tested/testable fork here: rivosinc/prometheus-slurm-exporter. It's a complete rewrite. Feel free to contribute. Our next steps are adding JSON-based licensing support as well as implementing some interfaces for slurmrestd support as the same openapi plugin is used for both the cli and restd. Will publish a grafana template soon
It comes with some extra goodies like client-side throttling, job tracing, and more, but also doesn't yet implement things like gpu support, fairshare, or daemon stats
Hi guys, we are actively maintaining a JSON-based, hopefully, more maintainable/tested/testable fork here: rivosinc/prometheus-slurm-exporter. It's a complete rewrite and forked only to show history. Feel free to contribute. Our next steps are adding JSON-based licensing support as well as implementing some interfaces for slurmrestd support as the same openmp api is observed for both the cli and restd. Will publish a grafana template soon
It comes with some extra goodies like client-side throttling, job tracing, and more, but also doesn't yet implement things like gpu support, fairshare, or daemon stats
The repository is empty ATM.
Yeah, sorry about that. We have to go through a OSS review process, so I had to briefly take it down. It should be up again momentarily. I apologize
Howdy guys, the exporter was cleared and is back up. Will release a default template dashboard soon as well. It's my first go project that I contributed to from scratch. Feel free to make issue if you guys think that things can be written better, including nit picks. Would love any feedback
Hey folks,
I wanted to let you know that I've released the first version of the new prometheus-slurm-exporter that uses slurmrestd
for gathering data rather than parsing text with sinfo
.
This new project is actively maintained by the Research Advanced Computing Services team at the University of Oregon.
Our project aims to be a drop-in replacement for this project, and it plugs right into the existing SLURM Dashboard. Future development (for the forseeable future) will maintain that backwards compatibility. With each new version of this project, I aim to support the three most-recent SLURM versions (currently only supporting 23.11, 24.05).
As I just cut the first real release today, and I only have access to a SLURM 23.11 cluster (future work will include end-to-end testing on multiple clusters via Docker), it's only been fully tested on a cluster running 23.11. The code exists and all my unit tests are passing against example 24.05 data, but perhaps I'll need some issues raised if there are problems with 24.05.
Please feel free to open issues if you find any bugs or want to request features.
I see that the last commit to main was in March of 2022. I also see a lot of outstanding PR's. Does this mean the repo is not maintained anymore? Is there a dependable fork to rely on?