nsfcac / MonSter

Monitoring Tool for HPC metrics from batch scheduler and BMC resources
MIT License
5 stars 6 forks source link

Unable to dump slurm metrics since parse.py is returning empty dictionary #3

Closed blesson-james closed 2 years ago

blesson-james commented 2 years ago

Getting below error while executing mslurm.py: image

The reason for above error is, parse_node_metrics method present in parse.py is returning empty dictionary. Please check the below screenshot: image

The reason for the empty dictionary is, there is a if condition in parse.py where hostname from the response is compared with hostname present in nodes table which get the system hostname info from idrac using Redfish API in idrac.py as shown in the below screenshots: image image

But usually our idrac system hostname & OS hostname will be different, due to which the scripts will fail everytime.

Artlands commented 2 years ago

Getting below error while executing mslurm.py: image

The reason for above error is, parse_node_metrics method present in parse.py is returning empty dictionary. Please check the below screenshot: image

The reason for the empty dictionary is, there is a if condition in parse.py where hostname from the response is compared with hostname present in nodes table which get the system hostname info from idrac using Redfish API in idrac.py as shown in the below screenshots: image image

But usually our idrac system hostname & OS hostname will be different, due to which the scripts will fail everytime.

Thanks for reporting this issue. The hostname in the "nodes" table is obtained via iDRAC (/redfish/v1/Systems/System.Embedded.1), which is originally pulled from the node OS and is SUPPOSED to be aligned with the OS hostname. This is the hook where we correlate the Slurm info with iDRAC metrics.

In the case that they are different, I will add configuration options in the config file, where the user can manually define the mapping of idrac hostnames to OS hostnames.