prometheus-community / node-exporter-textfile-collector-scripts

Scripts for node-exporter's textfile collector
Apache License 2.0
513 stars 191 forks source link

apt_info.py can hang for hours #179

Closed anarcat closed 1 year ago

anarcat commented 1 year ago

We have a situation here where numerous machines are seeing the apt_info.py script hang for hours. This has been reported in Debian as bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028212 and is being tracked in our internal tracker as https://gitlab.torproject.org/tpo/tpa/team/-/issues/41355

This is possibly an upstream issue, part of python-apt, requires further investigation. Current workaround includes installing a TimeoutStartSec=30s in the systemd unit [Service] block.

anarcat commented 1 year ago

In https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028212#62, @julian-klode has an excellent point that apt_info.py should not trigger apt update on its own. There are out of band mechanisms to do this (e.g. unattended-upgrades or apt itself, actually asked jak for clarification on that) that are supposed to take care of this. apt_info.py's job is simply to collect the numbers and dump them into something prometheus can parse. It should do the least possible for this.

For out of date mirrors or data, it should report the cache age (#180), not try to pre-emptively update it.

So i'll make a PR to fix that shortly.