ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

Jenkins admin password not injected at deployment #210

Open IvanVN opened 5 years ago

IvanVN commented 5 years ago

We are currently adapting our buildfarm to the JEP-200 changes. However, it seems that the deployment process fails to inject into Jenkins the admin password configured in the master.yml file.

After successfully running the reconfigure.bash script, Jenkins still shows the "Unlock Jenkins" screen, preventing the generate_all_jobs.py script to connect to Jenkins and configure it.

If we manually unlock Jenkins and set an admin password in the UI, the generate_all_jobs.py script runs successfully afterwards.

The /etc/default/jenkins configuration file in the master machine seems to have the flag properly set:

# Allow graphs etc. to work even when an X server is present
JAVA_ARGS="-Djava.awt.headless=true -Djenkins.install.runSetupWizard=false"

So we don't know why the unlock screen still appears.

We are using an script that automatically creates and deploys the buildfarm, so it is critical for us to avoid the need of introducing manually any stuff in the UI. In the pre-jep-200 setup it was working flawlessly.

Has anyone suffered the same issue? Any clue of why the password is not properly injected?

Thank you.

gavanderhoorn commented 5 years ago

@nuclearsandwich: would you have 5 mins to take a look at this? I've just run into this myself as well and it's blocking deployment of new instances.

RonaldEnsing commented 5 years ago

I would also be interested in a solution to this issue.

nuclearsandwich commented 5 years ago

I'll try to give this a look at some point this week. Since I set up the farms manually before configuring them I usually dismiss this screen and sign in with the configured admin credentials so I wasn't aware that dismissing the screen was an actual required configuration step.

gavanderhoorn commented 5 years ago

@nuclearsandwich: any updates?

nuclearsandwich commented 5 years ago

Thanks everyone for reporting the issue and for sharing your results.

I created a new test deployment to check this today.

Here's what I did.

  1. Provisioned the usual three machines

  2. Updated my buildfarm_deployment_config common.yaml and master.yaml with machine specific details.

  3. Used @gavanderhoorn's python script to generate a password and hash other than changeme and updated both the hash and cleartext password (used by the master and agents respectively).

  4. Updated a ros_buildfarm_config branch with deployment-specific details.

  5. Accessed the master host and cloned the config repository, then ran

    apt update && ./install_prerequisites && ./reconfigure master
  6. From a local machine with ros_buildfarm installed in a virtualenv ran

    generate_all_jobs.py https://raw.githubusercontent.com/nuclearsandwich/ros_buildfarm_config/deployment/2019-04-17/index.yaml --ros-distro-names melodic --commit

The results:

It's obnoxious that the wizard is still shown and I'd like to figure out why, but it didn't prevent my from running the ros_buildfarm job creation scripts.


While I follow up on suppressing the wizard again, it would be helpful if someone who is blocked on deploying jobs after configuration could share their buildfarm_deployment_config and ros_buildfarm_config with me so I can try to reproduce their trouble.

nuclearsandwich commented 5 years ago

Currently reading the logic around the runSetupWizard flag. My first pass through that code tells me that our existing config should bypass all initial setup and take us straight to the RUNNING state but that's either not happening or we're being brought back into a setup state on login. I'll try polling the state via Jenkins CLI on a fresh test redeploy. https://github.com/jenkinsci/jenkins/blob/3dc04f96ae743239c7d7118c2a3a76e364924626/core/src/main/java/jenkins/install/InstallUtil.java#L133-L202

nuclearsandwich commented 5 years ago

After a deployment the install state is NEW. I'm having a bit of trouble navigating why that is or what the best way to move out of NEW is.

nuclearsandwich commented 5 years ago

I can get us out of NEW by executing Jenkins.getInstance().getSetupWizard().completeSetup() but I'm not sure why it's not happening as a result of the flag being set...

nuclearsandwich commented 5 years ago

@IvanVN, @gavanderhoorn, and @RonaldEnsing although I was never able to produce an inability to configure jobs on my testfarms. I have seen that the Wizard wasn't being suppressed when accessing new instances via the web UI. https://github.com/ros-infrastructure/buildfarm_deployment/pull/214 addresses that latter issue. If you could check whether it also resolves your issues with configuration that would be great.

jonazpiazu commented 5 years ago

I just launched a new test deploy from scratch, and the password issue that prevented the generate_all_jobs.py script from running is not happening any more - I do have no idea what might have changed.

When I browse into the Jenkins UI I get the Wizard screen, which I can skip.

nuclearsandwich commented 5 years ago

When I browse into the Jenkins UI I get the Wizard screen, which I can skip.

214 will complete the wizard if you want to give that a try.

jonazpiazu commented 5 years ago

Sorry I need to correct my previous comment.

The problem seems to be related to the Jenkins version as specified in the master.yaml file.

When forking from the main repository where the Jenkins version is not specified, version 2.164.2 gets installed, and then I still get the error of the password not being injected.

I have tried to manually specify version 2.138.3, and then it overcomes that problem.

@nuclearsandwich can you check which Jenkins version were you using in your previous test?

nuclearsandwich commented 5 years ago

I have tried to manually specify version 2.138.3, and then it overcomes that problem.

@nuclearsandwich can you check which Jenkins version were you using in your previous test?

My latest tests have all been with 2.164.2

nuclearsandwich commented 5 years ago

I was deploying a new farm with a different config than my usual test deployment config (which is based on the buildfarm_deployment_config master branch) and I am seeing a potentially related issue. Instead of using the credentials configured via puppet. Jenkins is setting it's randomly generated default admin password.

This issue may still be outstanding especially if you're working from an earlier config being ported forward.

jonazpiazu commented 4 years ago

I have run further tests regarding this issue. The password injection works fine up to Jenkins version 2.138.3. For newer versions, after running Puppet, there are two different Jenkins admin users created:

ls /var/lib/jenkins/users/
admin  admin_5200768367454072463  users.xml

The password specified in the buildfarm yaml configuration is well injected into the admin user:

cat /var/lib/jenkins/users/admin/config.xml  | grep password
      <passwordHash>#jbcrypt:$2a$10$vmmqzRmcDHj1t9Ajgq5edekPD8cbpD./pBSGcYzia.OsIroOKjghm</passwordHash>

The other admin user has the password that corresponds to the one found in /var/lib/jenkins/secrets/initialAdminPassword:

cat /var/lib/jenkins/users/admin_5200768367454072463/config.xml | grep password
      <passwordHash>#jbcrypt:$2a$10$tQURLPyepNOlsO8ZHmMDYeenk/brjXGoIKEwci3vPaKqOpgStgVry</passwordHash>

But the users.xml maps the user admin to the admin_5200768367454072463 folder. In Jenkins 2.138.3 that users.xml file does not exist.

I have manually changed users.xml and make it point to the admin folder, and after a Jenkins restart, my Jenkins admin's password matches the one specified in the buildfarm yaml configuration. Just deleting the users.xml also seems to have the same effect.

jonazpiazu commented 4 years ago

I added a simple workaround in a branch. Even if it works, I would not merge it, as I do not think it is a clean solution.

jonazpiazu commented 4 years ago

Further looking into the Jenkin's documentation, I found the following issue:

https://jenkins.io/doc/upgrade-guide/2.138/#SECURITY-1072

As stated in there, the user's record has changed, so the cleaner solution is to update the current format.