ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

Refactor for deploying a Xenial buildfarm. #146

Closed nuclearsandwich closed 7 years ago

nuclearsandwich commented 7 years ago

This is a heavy refactor of the buildfarm deployment puppet. The purpose of the refactor is to

Outstanding work

Issues resolved by this PR

Outside the scope of this PR is completing the move to Puppet 4. Some of the modules we're using don't yet explicitly support Puppet 4 and Xenial ships with the recently EOL'd 3.8.5 which is what this targets currently. We are using the "future" parser which is shared by puppet 3 and 4. I've also incorporated a puppet lint config and I think it'd be a nice idea to get puppet-lint and puppet parser validate running through travisci or something.

This PR is quite large and moves a lot of stuff around. When it comes time to review I'll point out a few highlights and areas I think need the most attention.

nuclearsandwich commented 7 years ago

In an offline discussion earlier this week I decided that testing with and supporting a single node buildfarm was not going to be a goal for this PR. The refactoring I've done so far should make it much more doable but right now I want to focus on getting the official buildfarm tested and then migrated to Xenial. Once that's complete circling back to support a single node configuration can be done with time pressure.

It's still something we want to do as it greatly eases running local or alternate buildfarms on smaller setups.

nuclearsandwich commented 7 years ago

I also found and fixed some bugs related to the systemd <-> initscript interaction for the agent profile which was blocking further work on the repo profile.

nuclearsandwich commented 7 years ago

This PR is completely untested on Trusty and I don't currently see a reason to try and achieve trusty compatibility.

Plugin dependency management and versioning is still a royal pain so there will be some post setup manual work to do. As this is my first time migrating a ROS buildfarm I'll be looking for help preparing a test and migration checklist.

dirk-thomas commented 7 years ago

This PR is completely untested on Trusty and I don't currently see a reason to try and achieve trusty compatibility.

:+1: Supporting only Xenial is absolutely fine.

Plugin dependency management and versioning is still a royal pain so there will be some post setup manual work to do.

For a first steps that might be fine. But in order to use this for CI testing it needs to be fully automated at some point.

As this is my first time migrating a ROS buildfarm I'll be looking for help preparing a test and migration checklist.

I guess the "best" test is setting up a test farm and let it build e.g. all of Lunar and compare that the jobs which pass on the live farm also pass on the test farm.

nuclearsandwich commented 7 years ago

For a first steps that might be fine. But in order to use this for CI testing it needs to be fully automated at some point.

Do you know how this was dealt with previously? The puppet module for Jenkins doesn't resolve plugin dependencies because only the latest (non-LTS even) plugin versions have dependency metadata provided via the Jenkins wiki. There is an effort underway at rtyler/jpm to get a sophisticated package manager for Jenkins plugins off the ground but it's also in very early stages and doesn't yet support LTS.

EDIT: I should point out that this step is currently necessary using the master configs on Trusty, not a regression caused by the refactor / updates. It's definitely something I tried to resolve, but there doesn't seem to be a solution beyond pinning every plugin version in the config, then manually adding each plugins recursive dependencies to the puppet configs, which seems like a lot of work to redo whenever we update plugin versions.

dirk-thomas commented 7 years ago

Do you know how this was dealt with previously? The puppet module for Jenkins doesn't resolve plugin dependencies because only the latest (non-LTS even) plugin versions have dependency metadata provided via the Jenkins wiki. There is an effort underway at rtyler/jpm to get a sophisticated package manager for Jenkins plugins off the ground but it's also in very early stages and doesn't yet support LTS.

The puppet files explicitly installed the dependencies. Those lines usually had a comment to mention which other plugin they are required for.

While that is certainly more effort it avoids any manual work for deploying the machines. Once a better solution is available we should certainly change to that to avoid having to list the dependencies explicitly.

nuclearsandwich commented 7 years ago

cc50aca spikes out a script that reads a current Jenkins instance and dumps the currently installed plugins as an include-able puppet module. Each plugin has a require block for it's dependencies rather than a comment with its dependents since that's the format of the data from Jenkins but we could easily invert that if we think it's preferable.

nicolov commented 7 years ago

Thanks for all this good work. I'm looking to set up a small (4-5 machines) build farm, do you suggest trying to use this branch or sticking to trusty for now?

nuclearsandwich commented 7 years ago

Thanks for all this good work. I'm looking to set up a small (4-5 machines) build farm, do you suggest trying to use this branch or sticking to trusty for now?

If you're familiar with the ROS buildfarm's design (I wasn't when I started this project :grin:), plan to keep your farm around for a while, and don't mind a few speedbumps early in the setup process, then the Xenial branch is probably suitable. I think the prime advantage of using Xenial is Java 8 for the Jenkins master. Since the packages themselves are built in containers, the host OS has less impact on available dependencies. If you're going to be making modifications to the puppet code, the refactor here is intended to make it much more grok-able and customizable but there's still work to be done documenting recommended ways to hook in and layer this code into your own puppet scripts. I particularly need to document the hiera keys.

The trusty code has been configuring agent machines consistently for the current farm though the plugin versions and dependencies are out of sync with what's currently deployed on the farm (which can be resolved by updating plugins in the Jenkins UI). If you want to take advantage of that proofing or your farm needs to match as faithfully as possible to the currently deployed ROS buildfarm then the code in master is the place to start, with the caveat that we are in early testing for a xenial based buildfarm and want to migrate the current farm in the coming months.

Be sure to browse some of the recent issues on this repository, particularly #149 as there are some problems with both the current master and the current xenial branch.

nuclearsandwich commented 7 years ago

https://github.com/ros-infrastructure/buildfarm_deployment/pull/153/commits/c14135d0644c73dfaf5f98f15661907b54a116e2 will need to be incorporated here.

dirk-thomas commented 7 years ago

I didn't see any Java arguments in the patch but I might have just missed them. What JAVA_ARGS are you using for Java 8? The existing Jenkins machine has some commented options in the /etc/default/jenkins file which should be considered (which are based on the guide referenced in #144).

nuclearsandwich commented 7 years ago

I didn't see any Java arguments in the patch but I might have just missed them. What JAVA_ARGS are you using for Java 8? The existing Jenkins machine has some commented options in the /etc/default/jenkins file which should be considered (which are based on the guide referenced in #144).

For now, the Jenkins Java arguments are still part of the hiera config rather than the Puppet. This is how they were maintained previously and I was changing so much else that I wanted to leave anything I didn't have an explicit reason to change the way it was.

I need to take time to read through the entire referenced blog post still. It looks like a lot of the flags they're using are tailored toward logging and diagnostics. Is there going to be a disk utilization (storage and throuput) consideration if we just enable all of those? Are there tuning flags we should adopt directly without the logging for now?

dirk-thomas commented 7 years ago

For now, the Jenkins Java arguments are still part of the hiera config...

That's why I didn't see them in the diff :wink: Thanks for the pointer.

Is there going to be a disk utilization (storage and throuput) consideration if we just enable all of those? Are there tuning flags we should adopt directly without the logging for now?

I wouldn't blindly enable all the arguments mentioned in the blog post. Since we didn't have Java 8 I wasn't able to try those but the config file on the existing Jenkins machine has the options I thought would be most valuable in a commented line:

nuclearsandwich commented 7 years ago

Let's move the JVM tuning discussion to the config repo in order to keep the discussion closer to the code.

nuclearsandwich commented 7 years ago

I have changed the target branch of this pull request to point to a new xenial branch in order to stagger the rollout of xenial changes to default branches. As a note to (mostly) myself: Do not delete the origin branch until the corresponding configuration repositories are updated to target ros-infrastructure/xenial branch instead of my fork.