usegalaxy-eu / infrastructure-playbook

Ansible playbook for managing UseGalaxy.eu infrastructure.
MIT License
16 stars 91 forks source link

Parallelize CVMFS monitoring to bring back the CVMFS Grafana dashboard #1280

Closed kysrpex closed 1 month ago

kysrpex commented 1 month ago

The script /usr/bin/check_cvmfs_repos installed by the CVMFS monitoring role hxr.monitor-cvmfs takes longer than 2 minutes to run (the Telegraf timeout for this script) due to misbehaving CVMFS servers and serial execution. This results in no measurements being registered.

Jul 25 13:52:00 cvmfs1-ufr0.internal.galaxyproject.eu telegraf[2616631]: 2024-07-25T11:52:00Z E! [inputs.exec] Error in plugin: exec: command timed out for command "/usr/bin/check_cvmfs_repos": /usr/bin/check_cvmfs_repos: line 9: [: : integer expression expected...

Add timeout to curl calls in check_cvmfs_repos script from CVMFS monitoring role hxr.monitor-cvmfs and parallelize all check_repo calls so that the script is guaranteed to exit before it times out.

kysrpex commented 1 month ago

grafik

The outcome is a fixed dashboard.