guilbaults / infiniband-exporter

Prometheus exporter for a Infiniband Fabric
Apache License 2.0
55 stars 22 forks source link

Must a NODE_NAME_MAP be specified? #14

Closed kmanalo closed 3 years ago

kmanalo commented 3 years ago

While our site plans on using NODE_NAME_MAP, I noticed that the service doesn't work if one isn't supplied by default.

As a workaround, we do something like

/usr/bin/infiniband-exporter --node-name-map /dev/null

Can the code act similarly if NODE_NAME_MAP is not set?

Thanks

guilbaults commented 3 years ago

Hi, could you test the latest commit in this branch: https://github.com/guilbaults/infiniband-exporter/tree/node_name_map

I added a bit more logging so it will be easier to find the issue.

On my system, I can run it manually without having NODE_NAME_MAP,

# python infiniband-exporter.py --verbose --port 8080
2021-04-08 19:53:36,917 - DEBUG - No node-name-map was provided
2021-04-08 19:53:36,917 - DEBUG - Counters will not reset automatically
2021-04-08 19:53:46,925 - DEBUG - Start of collection cycle
2021-04-08 19:53:50,495 - DEBUG - End of collection cycle
infiniband_speed{local_guid="0x506b4b0300605de0",local_name="SwitchIB Mellanox Technologies",local_port="36",remote_guid="0x0002c903000c4bc9",remote_name="lustre01-mds2 mlx4_0",remote_port="1"} 10.0

And it seem to detect correctly when the env var exists

# NODE_NAME_MAP=/etc/node-name-map python infiniband-exporter.py --verbose --port 8080
2021-04-08 19:57:14,917 - DEBUG - Using NODE_NAME_MAP provided in env vars: /etc/node-name-map
2021-04-08 19:57:14,917 - DEBUG - Counters will not reset automatically
2021-04-08 19:57:19,896 - DEBUG - Start of collection cycle
2021-04-08 19:57:23,484 - DEBUG - End of collection cycle

And it use the content of the node name map file to set the correct names when specified

infiniband_speed{local_guid="0x506b4b0300605de0",local_name="1L07-IB-1:SB7890",local_port="36",remote_guid="0x0002c903000c4bc9",remote_name="lustre01-mds2 mlx4_0",remote_port="1"} 10.0

And the content of the sysconfig config used for the production daemon

cat /etc/sysconfig/infiniband-exporter.conf
NODE_NAME_MAP=/etc/node-name-map
CAN_RESET_COUNTER=TRUE
kmanalo commented 3 years ago

@guilbaults thanks it appears to work fine here, I appreciate the fix and the easy logging.

Also thanks for pointing out how we can adjust the options with the sysconfig config.

Will you be marking an updated release as well? We're relying on the RPM creation.

guilbaults commented 3 years ago

I will make a new release soon, I need modify pull #16 before doing the next release.

guilbaults commented 3 years ago

The new RPM is ready: https://github.com/guilbaults/infiniband-exporter/releases/tag/v0.0.3