Open timja opened 10 years ago
By 'Take slave offline' you mean 'Mark this node temporarily offline', right? What kind of slave is this, JNLP, SSH, ...?
Also, what retention strategy did you select? 'Keep online as much as possible'?
We have mostly Swarm slaves, and I usually use the jenkins-cli offline-node command, although I might have used the "Mart this node temporarily offline" button once in a while. I am not sure if it also happens on ssh slaves.
Both are the same feature, marking a node offline rather than disconnecting. This is not supposed to cut any connections. Is this reproducible on a pristine Jenkins instance, no plugins etc.?
I didn't try that, but I just looked at the source code, and found this:
public Result run(@Nonnull BuildListener listener) throws Exception { .... Computer c = node.toComputer(); if (c==null || c.isOffline()) { // As can be seen in HUDSON-5073, when a build fails because of the slave connectivity problem, // error message doesn't point users to the slave. So let's do it here. listener.hyperlink("/computer/"+builtOn+"/log","Looks like the node went offline during the build. Check the slave log for the details."); ...
And the isOffline() method also checks for the temporarilyOffline status:
https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Computer.java#L507
public boolean isOffline() { return temporarilyOffline || getChannel()==null; }
So, you are correct, the connection is not broken, but the check in this case is wrong. It should check only for the channel, and not the temporarilyOffline status.
I am also experiencing this issue one some of our Windows Server 2008 nodes. I manually mark the node as offline, and some of them don´t like it (and the configuration for the nodes should be identical). Will update this issue if I find more detailed reproduction steps.
My stacktrace is slightly different, instead of null I have a valid computername, like so:
Looks like the node went offline during the build. Check the slave log for the details.FATAL: /var/lib/jenkins/logs/slaves/EC2-W8S-01/slave.log (No such file or directory)
EC2-W8S-01 being a node name.
wdjonsson: Could you please specify what issue you're experiencing? Bogus messages in the log, or build failures because the node went offline?
Hi,
We experiment this issue on Jenkins 1.584.
The scenario is the following:
Do you know if a workaround or solution has been found for this problem?
Thanks,
Damien.
It seems the code mentioned above does not exist any longer. Might it be that it has been solved for another ticket?
After analysis of the stack trace and of the code of 1.584, it happens after the run is complete, when the AbstractBuildRunner attempts to write the annotated log.
[Originally related to: JENKINS-24123]
I keep getting failures on random jobs (see example below), when I take a slave offline.
I thought the purpose of "take slave offline" (versus disconnecting a slave) is that running jobs can continue to run, but no new jobs are started, and I can then disconnect the slave when all jobs are finished (we have a small script which does exactly that, to take a slave out of the cluster).
With the current behaviour, it is impossible to cleanly shutdown a slave.
Expected: Taking a slave offline should NEVER have any impact on any of the jobs running on that slave. They should not even be aware of the fact.
Originally reported by marc_guenther, imported from: Jobs fail due to "node went offline during the build"