web-platform-tests / results-collection

Other
41 stars 46 forks source link

Safari builds interrupted due to unresponsive daemon process #650

Open jugglinmike opened 5 years ago

jugglinmike commented 5 years ago

Buildbot workers are implemented as Python Twistd daemon processes. On 2019-02-12, the process running on this project's Mac Mini system became unresponsive. All testing halted, and the worker remained in this state for the next 48 hours. Bad timing for me to take a day off!

I connected to the machine manually in hopes of finding an explanation. Unfortunately, I couldn't see anything amiss. CPU, memory, and disk consumption were all within normal operating bounds. The process was not listed as defunct or any other anomalous state. In fact, it had been running for close to two months:

$ ps aux | grep 'twist[d]'
kazooie           382   0.0  0.2  4317752  12988 s001  S+   21Dec18 204:01.16 /usr/bin/python /usr/local/bin/twistd --nodaemon --python=buildbot.tac --logfile=buildbot.log --prefix=worker

I forcibly killed the process, and the system automatically restarted it and resumed testing. Due to our limited resources, we will not be able to "catch up," so wpt.fyi will not list results for the revisions selected for the past two days.