prometheus-community / node-exporter-textfile-collector-scripts

Scripts for node-exporter's textfile collector
Apache License 2.0
488 stars 185 forks source link

NVMe metrics collector based on parallel #204

Closed dobbi84 closed 6 months ago

dobbi84 commented 7 months ago

The collector based on parallel allows to collect the metrics quicker when the node has several NVMe drives. Included a Grafana dashboard that visualizes the metrics and for each panel reports a description coming from the NVMe express specifications.

Because there are now four files NVMe related everything has been moved to a common directory.

dswarbrick commented 6 months ago

Thanks for your contribution. However, when nvme_metrics.py was introduced (#155), it was with the intention that it would supersede and eventually replace the shell script variant. Keeping multiple variants of collectors is not a path we want to go down, as it is confusing for end users, and increases the burden on maintainers.

Besides, these collectors are typically only run at fairly infrequent intervals. For example, the Debian package of this repo ships with systemd timers than execute some collectors with a 15 minute interval. With such infrequent execution, is the utmost speed and parallelism that important?

dobbi84 commented 6 months ago

I agree with you, i fell as well on the mistake of using the bash version. I run my timers at 1m interval or less and parallel seemed a painless way of collecting fast from multiple drives.

I noticed your effort in the go-nvme and i think could be a good solution to have an exporter that does not parse the nvme-cli output and possibly aligned with the nvme spec release cycle.

I can close this the PR, if you agree.

dswarbrick commented 6 months ago

I noticed your effort in the go-nvme and i think could be a good solution to have an exporter that does not parse the nvme-cli output and possibly aligned with the nvme spec release cycle.

I split go-nvme out of another pet project of mine, https://github.com/dswarbrick/smart, but I don't really have the time to actively work on either. These projects were really just to satisfy my curiosity when I started them way back in 2017. Nowadays, I think that Rust would be a more appropriate language in which to write low-level libraries which interact with the kernel.

Implementing comprehensive support for the NVMe standard is a mammoth task - just look at https://github.com/linux-nvme/libnvme to get an idea. So, go-nvme exists really more as just a proof of concept.

If an NVMe exporter were to be developed (in any language), it would need to run as root to be able to send the NVMe admin commands to devices. Most experienced sysadmins still have nightmares about running HTTP services as root - probably due to some decades-old RCE exploit in Apache 1.3, before dropping privileges became de facto. This is why IMHO for now, parsing the (preferably JSON) output of nvme-cli, smartctl etc. is the preferable solution, since it allows for privilege separation.

dobbi84 commented 6 months ago

understood thanks for the direction

igaw commented 6 months ago

If an NVMe exporter were to be developed (in any language), it would need to run as root to be able to send the NVMe admin commands to devices.

The need for root to run nvme-cli has changed slightly recently. At least nvme list, nvme list-subsys, etc don't need root anymore. These commands are using the information from sysfs. Though the nvme smart-log is till issuing a request and this depends on root permission. But given this is a real use case we can think about adding the smart log information to sysfs as well. So I am interested to get these use case working without root, though that's just me. I can bring this topic up at the upcoming LSFMM conference. In this case it would be good to know which information needs to be available for 'non root' users.

dobbi84 commented 6 months ago

In this case it would be good to know which information needs to be available for 'non root' users.

i think a good start would be:

jmhands commented 6 months ago

adding the standard NVMe SMART log to sysfs would be great. The temperature commands are already converted to degrees C, just make sure we export the Data Units Read and Data Units Written in bytes instead of the standard 1000 512B units.

dswarbrick commented 6 months ago

This discussion is moving off-topic for this PR. I recommend moving the discussion to the libnvme project, since this PR will not be merged.