openbmc / openbmc-build-scripts

Apache License 2.0
19 stars 50 forks source link

copy at end of build occasionally hangs #17

Closed geissonator closed 5 years ago

geissonator commented 6 years ago

See this 1-2 times a day in our CI jobs. Then end of the build-setup.sh, where we're copying the files from the docker container to the mounted filesystem, it just hangs. You have to manually kill the job or it holds up all the other jobs.

Here's an example:

09:12:41 + cp -r /tmp/openbmc/abi_version /tmp/openbmc/buildstats /tmp/openbmc/cache /tmp/openbmc/deploy /tmp/openbmc/hosttools /tmp/openbmc/log /tmp/openbmc/pkgdata /tmp/openbmc/saved_tmpdir /tmp/openbmc/sstate-control /tmp/openbmc/stamps /tmp/openbmc/sysroots /tmp/openbmc/sysroots-components /tmp/openbmc/work /tmp/openbmc/work-shared /var/lib/jenkins-slave/workspace/openbmc-build-gerrit-trigger-meta/distro/ubuntu/label/builder/target/zaius/openbmc/build/tmp
11:23:39 Set build name.
11:23:39 New build name is '#70-Joseph Reynolds'
11:23:39 Build was aborted

Seems like we could def optimize this to only copy of the artifacts we actually need. Maybe provide an option to copy it all, but default to just the image artifacts.

geissonator commented 6 years ago

Hung our CI last night for 10 hours :(

01:30:00 NOTE: Tasks Summary: Attempted 5391 tasks of which 4177 didn't need to be rerun and all succeeded.
01:30:00 + cp -r /tmp/openbmc/abi_version /tmp/openbmc/buildstats /tmp/openbmc/cache /tmp/openbmc/deploy /tmp/openbmc/hosttools /tmp/openbmc/log /tmp/openbmc/pkgdata /tmp/openbmc/saved_tmpdir /tmp/openbmc/sstate-control /tmp/openbmc/stamps /tmp/openbmc/sysroots /tmp/openbmc/sysroots-components /tmp/openbmc/work /tmp/openbmc/work-shared /var/lib/jenkins-slave/workspace/openbmc-build-gerrit-trigger-meta/distro/ubuntu/label/builder/target/zaius/openbmc/build/tmp
11:21:53 /var/lib/jenkins-slave/workspace/openbmc-build-gerrit-trigger-meta/distro/ubuntu/label/builder/target/zaius/openbmc-build-scripts/build-setup.sh: line 353: 26738 Terminated              docker run --cap-add=sys_admin --net=host --rm=true -e WORKSPACE=${WORKSPACE} -w "${HOME}" -v "${HOME}":"${HOME}" ${mount_obmc_dir} ${mount_ssc_dir} --cpus="$num_cpu" -t ${img_name} ${WORKSPACE}/build.sh
11:21:53 Set build name.
11:21:53 New build name is '#125-Brad Bishop-openbmc/meta-inventec'
11:21:53 Build was aborted
11:21:53 Archiving artifacts
11:21:53 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
11:21:53 Finished: ABORTED
geissonator commented 6 years ago

A quick fix is to add a "timeout 300 cp ...." to enforce a max 5 minute timeout. We also should look into only copying off what the jenkins job needs by default (i.e. the deploy/images/** files.)

charleshofer commented 6 years ago

Those both sounds like pretty easy changes to do. I can work on that.

charleshofer commented 6 years ago

https://gerrit.openbmc-project.xyz/#/c/openbmc/openbmc-build-scripts/+/12480/

charleshofer commented 5 years ago

https://gerrit.openbmc-project.xyz/#/c/openbmc/openbmc-build-scripts/+/12749/

geissonator commented 5 years ago

Charlie's fixes above have made this issue go away.