Closed clalancette closed 4 years ago
We talked about this a bit offline and I don't have any leads. If it's observed in the future we should try to do some forensics on the node before it gets culled. All of the instances @clalancette reported were Debian Stretch which would have been a correlation but @mikaelarguedas's example (now added to the issue body) was for a doc job running on Bionic.
I saw another one of these today and I think they're the result of a "graceful" death when a node is scaled in. In the past we've seen big nasty connection failure stacktraces but those were usually to unplanned node losses as opposed to a node being intentionally shutdown by an over-eager scale in metric.
It's been a long time since we have seen this. And the links are for the previous generation of the jenkins server. So I'm going to close this. We've also made our scaling in policy more conservative.
There have been a handful of jobs on build.ros.org that have started failing in the last day. Investigation into the console logs show no apparent cause; examples are:
All seem to have failed while installing packages through apt, but there are no (apparent) failures listed in the apt logs. @nuclearsandwich FYI