Closed tfoote closed 11 years ago
I created a test build which prints the hash sum mismatch string http://jenkins.willowgarage.com:8080/job/test_string%20matching/
The issue is that build step 1 is execute_shell and build step 2 is the groovy script to detect the errors. However if build step 1 fails, build step 2 is never getting run. And then the post build step does not have the extra information necessary.
Building on master in workspace /var/lib/jenkins/jobs/test_string matching/workspace
[workspace] $ /bin/sh -xe /tmp/hudson6940644219730500921.sh
+ echo W: Failed to fetch something Hash Sum mismatch
W: Failed to fetch something Hash Sum mismatch
+ exit -1
/tmp/hudson6940644219730500921.sh: 4: exit: Illegal number: -1
Build step 'Execute shell' marked build as failure
Build step 'Groovy Postbuild' marked build as failure
Finished: FAILURE
For the job link you posted above the post build script clearly detected the hashsum mismatch and marked the job no. 17 with a badge. I suppose you need to provide more information about the "many seen failures" to clarify what had really happened in these cases.
It's supposed to be something like this:
Check for 'Hash Sum mismatch'
Aborting build due to 'hash sum mismatch'
Immediately rescheduling new build...
That is only the fact for the build action. A post build script can never write any information to the log. But obviously it was successful for the referenced job since it marked the build with the badge.
Other instances from this weekend which were emailed to me.
The post build is working fine. There's also a 2nd build step which used to catch these errors and abort the job and retrigger them once. Since this is an intermittent race condition on the database being updated while apt-get updating.
All these jobs have the badge saying they weren't retriggered because they already were rescheduled jobs which is incorrect. Why do we have both a build step and a post build step?
The build step can do things which the post build can not:
The issue here is that the post build script considers the build as being already a rebuild and therefore does not trigger "another" rebuild. I don't see why that is the case since the previous build has no badges but obviously that is whats happening.
The latest Git plugin attaches badges to builds. The detection logic has been updated to check for more specific badges.
I've seen many of these failures. They are supposed to be caught, aborted, and retirggered.
http://jenkins.willowgarage.com:8080/job/ros-hydro-stereo-image-proc_binarydeb_raring_i386/17/