ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

Add missing dependencies to allow repo initialization on first puppet run. #178

Closed nuclearsandwich closed 6 years ago

nuclearsandwich commented 6 years ago

Puppet, at least version 3.8.5, does not evaluate environment variable strings for variable expansion, which was leading to a PYTHONPATH variable of /home/jenkins-agent/reprepro-updater/src:$PYTHONPATH with a literal $PYTHONPATH.

This interfered with the default path and caused https://github.com/ros-infrastructure/buildfarm_deployment/issues/174

I made a test deployment with this patch (and a few others). @gavanderhoorn if you have time to verify that this addresses #174 for you it would be appreciated.

jonazpiazu commented 6 years ago

The changes in the branch did not fix the problem in my case.

Sorry that I do not really know puppet, so I am not sure how to debug. I tried this naive approach:

    exec {'init_building_repo':
      path        => '/bin:/usr/bin',
      command     => "echo 1: \$PYTHONPATH >> /tmp/output.log",
      environment => ["PYTHONPATH=/home/${agent_username}/reprepro-updater/src"],
      user  => $agent_username,
      group  => $agent_username,
      unless     => "echo 2: \$PYTHONPATH >> /tmp/output.log; /home/${agent_username}/reprepro-updat
      logoutput   => on_failure,
      require     => [
        Vcsrepo["/home/${agent_username}/reprepro-updater"],
        File["/home/${agent_username}/.buildfarm/reprepro-updater.ini"],
      ]
    }

And that generated this output:

# cat /tmp/output.log 
2: /home/jenkins-agent/reprepro-updater/src
1: /home/jenkins-agent/reprepro-updater/src

So PYTHONPATH looks just fine, I am not sure why the error still happens.

gavanderhoorn commented 6 years ago

Seeing the comment by @jonazpiazu I haven't tested this yet @nuclearsandwich.

I do seem to remember that after sudo su - jenkins-agent I could not run the various scripts under reprepro-updater either after deployment finished (ie: the puppet run).

nuclearsandwich commented 6 years ago

Thanks for the feedback @jonazpiazu and @gavanderhoorn. I tore down and started on a fresh machine and it fails the first time but succeeds the second. I learned from @tfoote that puppet sometimes needs to be run twice (and observed this previously when provisioning master) but I don't remember it being necessary for repo hosts.

I think the change here is still warranted as there's not much use in having a literal $PYTHONPATH in the variable and leaving it in doesn't do what one might expect.

I'm going to dig through puppet to see if something is finishing the setup of python and should be required before the repository initializations.

gavanderhoorn commented 6 years ago

I'm by no means a puppet expert, but in my experience most problems are due to missing dependency declarations (or whatever puppet calls those). That is probably also the cause of #171 (I worked around that in https://github.com/ros-infrastructure/buildfarm_deployment/issues/70#issuecomment-301836295).

Running puppet twice is at best a work-around. It would be nice if we could deploy master, agent(s) and slave successfully, without needing that.

but I don't remember it being necessary for repo hosts.

Exactly. I wasn't expecting this either.

nuclearsandwich commented 6 years ago

Running puppet twice is at best a work-around. It would be nice if we could deploy master, agent(s) and slave successfully, without needing that.

Yep, I'm digging in to figure out what the missing link in the chain is here. Thanks for the input and patience.

nuclearsandwich commented 6 years ago

Okay, I think the latest changes should do it. I'm building new test branches and deploying to a new host to be sure.

tfoote commented 6 years ago

Before you do that you might want to remove the Class['python'], I think that's redundant with the module requirements.

nuclearsandwich commented 6 years ago

Before you do that you might want to remove the Class['python'], I think that's redundant with the module requirements.

Good catch. Because the require directive (rather than include) is used to bring in the profile::jenkins::agent class everything in there should be set up before anything within the profile::ros::repo class.

nuclearsandwich commented 6 years ago

This will close #174 and together with #179 should yield a clean deployment for repo hosts.

gavanderhoorn commented 6 years ago

Just ran a repo deploy on a clean VM after merging this and #179 and so far it seems puppet has run to completion without any complaints.

This is just a test VM, so the repo host isn't being used for anything, but so far looks good.

Nice one @nuclearsandwich 👍.

inigomartinez commented 6 years ago

I'm sorry to say that these fixes don't seem to work here. I have tried it several times by going back to a fresh imagen and executing reconfigure.bash master, and the result is always the same:

2018-01-15 12:55:40 +0100 /Stage[main]/Jenkins::Repo::Debian/Apt::Source[jenkins]/Apt::Setting[list-jenkins]/File[/etc/apt/sources.list.d/jenkins.list]/ensure
 (notice): created
2018-01-15 12:55:40 +0100 /Stage[main]/Jenkins::Repo::Debian/Apt::Source[jenkins]/Apt::Setting[list-jenkins]/File[/etc/apt/sources.list.d/jenkins.list] (info)
: Scheduling refresh of Class[Apt::Update]
2018-01-15 12:55:44 +0100 Puppet (err): Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install jenkins=
2.73.3' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Version '2.73.3' for 'jenkins' was not found
2018-01-15 12:55:44 +0100 /Stage[main]/Jenkins::Package/Package[jenkins]/ensure (err): change from purged to 2.73.3 failed: Could not update: Execution of '/u
sr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install jenkins=2.73.3' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Version '2.73.3' for 'jenkins' was not found
2018-01-15 12:55:44 +0100 /Group[jenkins] (notice): Dependency Package[jenkins] has failures: true
2018-01-15 12:55:44 +0100 /Group[jenkins] (warning): Skipping because of failed dependencies
2018-01-15 12:55:44 +0100 /User[jenkins] (notice): Dependency Package[jenkins] has failures: true
2018-01-15 12:55:44 +0100 /User[jenkins] (warning): Skipping because of failed dependencies
2018-01-15 12:55:44 +0100 /Stage[main]/Jenkins::Config/File[/var/lib/jenkins] (notice): Dependency Package[jenkins] has failures: true
2018-01-15 12:55:44 +0100 /Stage[main]/Jenkins::Config/File[/var/lib/jenkins] (warning): Skipping because of failed dependencies
2018-01-15 12:55:44 +0100 /Stage[main]/Jenkins::Config/File[/var/lib/jenkins/plugins] (notice): Dependency Package[jenkins] has failures: true
2018-01-15 12:55:44 +0100 /Stage[main]/Jenkins::Config/File[/var/lib/jenkins/plugins] (warning): Skipping because of failed dependencies
2018-01-15 12:55:44 +0100 /Stage[main]/Profile::Jenkins::Master/File[/var/lib/jenkins/credentials.xml] (notice): Dependency Package[jenkins] has failures: tru
e
2018-01-15 12:55:44 +0100 /Stage[main]/Profile::Jenkins::Master/File[/var/lib/jenkins/credentials.xml] (warning): Skipping because of failed dependencies

I tried this by setting BUILDFARM_DEPLOYMENT_BRANCH in buildfarm_depolyment_config to fix-repo-pythonpath.

gavanderhoorn commented 6 years ago

@inigomartinez: I'm not sure, but I believe this PR targets the repo deployment only. So master not working is expected at this point.

nuclearsandwich commented 6 years ago

I'm not sure, but I believe this PR targets the repo deployment only. So master not working is expected at this point.

This is correct. There are outstanding issues with master deployments (though if you run ./reconfigure master enough times it should get up and running 🙃) which I hope to address in more work later this week.