mikeferguson / buildbot-ros

A buildbot configuration for building ROS debians, docs, and tests.
51 stars 27 forks source link

Fix parallel build problems #24

Closed mikeferguson closed 4 days ago

mikeferguson commented 10 years ago
mikeferguson commented 10 years ago

It appears this may need to be done slightly differently. The git-pbuilder defaults to base-DIST-ARCH.cow, and I haven't found a way around this.

Since we only save back to the cowbuilder during the update script, and looking back through my logs this is also the only point where he hit snags, perhaps the script should instead check if the cowbuilder is already being updated, and then wait until it is done?

@mikepurvis you mentioned hitting at least one of theses, can you confirm that it was only in the cowbuilder-update.py step?

mikepurvis commented 10 years ago

I hit it in two places, both while doing an initial "force all"—

mikepurvis commented 10 years ago

This is what it looks like on the cowbuilder-update side for me: https://gist.github.com/mikepurvis/8715769

Even with just a dozen or so jobs, there seem to be a lot of conflicts on startup—which happens every time you make a release, since you have to bounce the buildbot master.

mikeferguson commented 10 years ago

@mikepurvis https://github.com/mikeferguson/buildbot-ros/commit/d319f61a37eb6dbc56b4ec20861a20f8a37b7a18 should avoid the conflicts in the cowbuilder updates. There is a potential problem here if too many builders wait too long, they may get cut off by the buildbot (but that default timeout is like 20 minutes I believe). If this works for you, I'll apply something similar to the reprepro update, which will avoid conflicts there.

mikepurvis commented 10 years ago

Merged and deployed now—so far looks great. Will keep posted.

mikepurvis commented 10 years ago

Okay, still seeing some issues in the includedeb step, too—here's an example:

https://gist.github.com/mikepurvis/8994782

mikeferguson commented 10 years ago

Yeah, I haven't addressed the includedeb yet.

mikeferguson commented 10 years ago

Which is why the ticket is still open...

mikepurvis commented 10 years ago

The lockfile has stuck for me twice now—haven't figured out yet exactly why it's happening, but will let you know as it becomes clearer.

mikeferguson commented 10 years ago

When you say lockfile, you are talking about the includedeb, right? Not the patch made for the cowbuilder update, right?

mikepurvis commented 10 years ago

No, this is the /tmp/buildbot_precise_amd64_lock which is created and destroyed by cowbuilder-update.py. When the file is stuck there, all the amd64 builds fail, of course.

Not sure if it's an issue with the particulars of how builds are failing, or a race condition, or something to do with how the reconfig is happening.

For now, I've just had to go in and manually delete the lock to get it going again.

jonbinney commented 10 years ago

I'm seeing this same issue on trusty - sometimes the /tmp/buildbot_precise_amd64_lock gets left lying around and blocks other builds.

jonbinney commented 10 years ago

oops, i mean /tmp/buildbot_trusty_amd64_lock in my case

mikepurvis commented 10 years ago

This is probably for a separate ticket, but I feel like there may be some other tooling out there that would handle many parallel builds more gracefully— specifically Docker for better isolation between builds, and Aptly for better repo management (especially the snapshot feature).

mikeferguson commented 10 years ago

I haven't spent much time debugging this, because long term I'm planning to move to docker. The ROS buildfarm scripts are being rewritten right now using Docker, and Dirk and I already planning to meet up and figure out how we can make buildbot-ros and the jenkins farm use the same underlying docker scripts.

jonbinney commented 10 years ago

The lock issue isn't a huge problem. If I can track it down and there's an easy short term fix I'll submit a PR. Docker long term sounds like a nice solution for keeping the builds isolated.

mikeferguson commented 4 days ago

No longer supported, about to archive