hash sum mismatches are not being caught anymore

tfoote commented 11 years ago

I've seen many of these failures. They are supposed to be caught, aborted, and retirggered.

http://jenkins.willowgarage.com:8080/job/ros-hydro-stereo-image-proc_binarydeb_raring_i386/17/

Hit http://us.archive.ubuntu.com raring-updates/multiverse Translation-en
Hit http://us.archive.ubuntu.com raring-updates/restricted Translation-en
Hit http://us.archive.ubuntu.com raring-updates/universe Translation-en
Fetched 679 kB in 2s (271 kB/s)
W: Failed to fetch gzip:/var/lib/apt/lists/partial/50.28.27.175_repos_building_dists_raring_main_source_Sources  Hash Sum mismatch

E: Some index files failed to download. They have been ignored, or old ones used instead.
I: unmounting /var/cache/pbuilder/ccache filesystem
I: unmounting dev/pts filesystem
I: unmounting proc filesystem
I: cleaning the build env 
I: removing directory /var/cache/pbuilder/build//26265 and its subdirectories
Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information
Build step 'Execute shell' marked build as failure
Build step 'Groovy Postbuild' marked build as failure
Build step 'Groovy Postbuild' marked build as failure
Description set: 
Sending e-mails to: ros-buildfarm-release@googlegroups.com vincent.rabaud@gmail.com
Finished: FAILURE

tfoote commented 11 years ago

I created a test build which prints the hash sum mismatch string http://jenkins.willowgarage.com:8080/job/test_string%20matching/

The issue is that build step 1 is execute_shell and build step 2 is the groovy script to detect the errors. However if build step 1 fails, build step 2 is never getting run. And then the post build step does not have the extra information necessary.

Building on master in workspace /var/lib/jenkins/jobs/test_string matching/workspace
[workspace] $ /bin/sh -xe /tmp/hudson6940644219730500921.sh
+ echo W: Failed to fetch something Hash Sum mismatch
W: Failed to fetch something Hash Sum mismatch
+ exit -1
/tmp/hudson6940644219730500921.sh: 4: exit: Illegal number: -1
Build step 'Execute shell' marked build as failure
Build step 'Groovy Postbuild' marked build as failure
Finished: FAILURE

dirk-thomas commented 11 years ago

For the job link you posted above the post build script clearly detected the hashsum mismatch and marked the job no. 17 with a badge. I suppose you need to provide more information about the "many seen failures" to clarify what had really happened in these cases.

tfoote commented 11 years ago

It's supposed to be something like this:

Check for 'Hash Sum mismatch'

Aborting build due to 'hash sum mismatch'
Immediately rescheduling new build...

dirk-thomas commented 11 years ago

That is only the fact for the build action. A post build script can never write any information to the log. But obviously it was successful for the referenced job since it marked the build with the badge.

tfoote commented 11 years ago

Other instances from this weekend which were emailed to me.

The post build is working fine. There's also a 2nd build step which used to catch these errors and abort the job and retrigger them once. Since this is an intermittent race condition on the database being updated while apt-get updating.

tfoote commented 11 years ago

All these jobs have the badge saying they weren't retriggered because they already were rescheduled jobs which is incorrect. Why do we have both a build step and a post build step?

dirk-thomas commented 11 years ago

The build step can do things which the post build can not:

provide feedback in the console
mark the build as "aborted" The post build on the other hand can catch case where one of the build step fails the build (skipping any further build actions).

The issue here is that the post build script considers the build as being already a rebuild and therefore does not trigger "another" rebuild. I don't see why that is the case since the previous build has no badges but obviously that is whats happening.

dirk-thomas commented 11 years ago

The latest Git plugin attaches badges to builds. The detection logic has been updated to check for more specific badges.

ros-infrastructure / buildfarm

hash sum mismatches are not being caught anymore #133