stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.
51 stars 25 forks source link

alaska: IB (RoCE) stats not shown by default in monitoring #122

Open sjpb opened 3 years ago

sjpb commented 3 years ago

Had to modify IB dashboards:

# default - doesn't work
irate(node_infiniband_port_packets_transmitted_total{job=~"$job",instance=~"$instance",device=~"$device"}[60s])

# works:
irate(node_infiniband_port_packets_transmitted_total{job=~"$job",instance=~"$instance"}[60s])`
sjpb commented 2 years ago

Fixed by #188, not merged to alaska yet