Open timja opened 2 years ago
Possibly relates to JENKINS-68319. There is something nasty going on with queue persistence losing entries in a very small percentage of test runs, unreproducible on my system but occasionally cropping up on automated builds. Whatever is eating up the queue entries there is probably eating up your queue entries, too. Unfortunately I am not actively working on this, but if you want to try your hand at a fix you could try running the test case from JENKINS-68319 in a loop, and if you can get it to reproduce easily try to figure out the cause.
We're using Jenkins as something like a central scheduler. On a busy night, it might queue and execute around 800 jobs per hour. During maintenance, we put Jenkins into quiet-down and let the jobs queue up while we complete the maintenance.
Unfortunately, it seems like we're starting to miss dozens of jobs during those quiet-down periods. Some jobs get queued up and placed in queue.xml, but an unknown number of other jobs never make it into the queue and are missed entirely. In the jenkins.log, we see it littered with Young GCs and messages like this:
This started happening after only 40 minutes or so in quiet-down and about 600 jobs in the queue.
I wish I had more logging for you on this, but it's the lack of any message in this situation that is the tricky part – we simply miss cron timers and never receive job results.
Any idea on what might be going on here?
Originally reported by esnewmanium, imported from: Cron Triggers Being Missed During Quiet-down Mode