Closed dumbbell closed 8 years ago
All Java tests should have a timeout, which may be a good reason to upgrade to JUnit 4 soon.
I haven't seen this in a while. Is this still relevant? Now that we are on JUnit 4, we should have more options with respect to how we enforce timeouts.
Our testsuite can still leave running nodes behind after a failure. But Jenkins doesn't use the wrapper script which loops anymore, except for the stable branch of the broker. So the root cause, leaving running nodes, is still relevant, even if the segfault are rare now.
Chances are this was https://github.com/rabbitmq/rabbitmq-server/issues/465, so closing until we discover something Java test suites-specific.
Note: I file the issue here because the Java client is involved in the stuck test and I don't know yet what's going on, but I don't have the time to study this right now.
The culprit is a timed out or aborted Jenkins build: Jenkins is unable to kill all involved processes (and doesn't notice the problem). Then it starts new builds which try to "lock" the node, fail to do so, try again forever, eventually consuming all their stack frames and segfault.
An example is this aborted build: http://rabbit-ci.lon.pivotallabs.com:8080/job/RabbitMQ%20Server/3734/
Followed by this build which segfaults: http://rabbit-ci.lon.pivotallabs.com:8080/job/xref%20%28plugins%20individually%29/4019/
Here are the running stuck processes on the Jenkins slave: