ros-infrastructure / rosdoc_lite

A light-weight version of rosdoc that does not rely on ROS infrastructure for crawling packages.
10 stars 31 forks source link

rosdoc jobs are hanging and blocking the build farm #8

Closed tfoote closed 11 years ago

tfoote commented 11 years ago

There were basically 20-30 rosdoc jobs which were running for several hours preventing the build farm from turning over. There needs to be a timeout mechanism to prevent this. Most of the jobs usually only take 3-5 minutes.

tfoote commented 11 years ago

FYI: @ablasdel @vrabaud

eitanme commented 11 years ago

I'm not exactly sure how to go about putting that timeout in with the jenkins_scripts, but @wmeeusse says it's on his list of things to look into. Regardless of having the timeout in place, I'd really like to track down why the jobs hang.

When I've seen them freeze in the past, it's typically during an apt-get install where the load on the machine is high for whatever reason. Did you have a chance to look at a job or two before you killed things to see what might have been going on? Was there high load on the servers? Anything else that might give us a clue as to what happened?

vrabaud commented 11 years ago

The easiest is to look at all the grey jobs here: http://jenkins.willowgarage.com:8080/view/Fdoc/

It seems that questions are asked: No test report files were found. Configuration error? Test reports were found but none of them are new. Did tests run?

Not sure if those have to be answered with a y/n (and are therefore blocking)

eitanme commented 11 years ago

From looking at:

http://jenkins.willowgarage.com:8080/view/Fdoc/job/doc-fuerte-calibration/19/console

It seems to me that the job hung on "apt-get update." The questions @vrabaud pointed out only got printed after the job was canceled.

However, another job just hung in the middle of documentation generation:

http://jenkins.willowgarage.com:8080/view/Fdoc/job/doc-fuerte-wg_hardware_test/19/consoleFull

Yet another hangs just before documentation generation is to run:

http://jenkins.willowgarage.com:8080/view/Fdoc/job/doc-fuerte-bosch_drivers/19/console

I'm pretty sure that this isn't a "waiting for yes" issue since it only happens intermittently. I don't have any great ideas about what might be going on, these jobs all seem to hang in different places. Not sure how to go about looking into this in more detail until jobs are hung again and we can examine what's going on on the slave machines. Any ideas?

eitanme commented 11 years ago

I've re-worked how tag files are generated and used from Doxygen and this issue seems to be more or less resolved. There are still some builds that take a long time, but they seem to be finishing. We can re-open this if we notice that things are hanging a lot again.