Open IvanVN opened 5 years ago
@nuclearsandwich: would you have 5 mins to take a look at this? I've just run into this myself as well and it's blocking deployment of new instances.
I would also be interested in a solution to this issue.
I'll try to give this a look at some point this week. Since I set up the farms manually before configuring them I usually dismiss this screen and sign in with the configured admin credentials so I wasn't aware that dismissing the screen was an actual required configuration step.
@nuclearsandwich: any updates?
Thanks everyone for reporting the issue and for sharing your results.
I created a new test deployment to check this today.
Here's what I did.
Provisioned the usual three machines
Updated my buildfarm_deployment_config common.yaml and master.yaml with machine specific details.
Used @gavanderhoorn's python script to generate a password and hash other than changeme
and updated both the hash and cleartext password (used by the master and agents respectively).
Updated a ros_buildfarm_config branch with deployment-specific details.
Accessed the master host and cloned the config repository, then ran
apt update && ./install_prerequisites && ./reconfigure master
From a local machine with ros_buildfarm installed in a virtualenv ran
generate_all_jobs.py https://raw.githubusercontent.com/nuclearsandwich/ros_buildfarm_config/deployment/2019-04-17/index.yaml --ros-distro-names melodic --commit
The results:
It's obnoxious that the wizard is still shown and I'd like to figure out why, but it didn't prevent my from running the ros_buildfarm job creation scripts.
While I follow up on suppressing the wizard again, it would be helpful if someone who is blocked on deploying jobs after configuration could share their buildfarm_deployment_config and ros_buildfarm_config with me so I can try to reproduce their trouble.
Currently reading the logic around the runSetupWizard
flag. My first pass through that code tells me that our existing config should bypass all initial setup and take us straight to the RUNNING state but that's either not happening or we're being brought back into a setup state on login. I'll try polling the state via Jenkins CLI on a fresh test redeploy.
https://github.com/jenkinsci/jenkins/blob/3dc04f96ae743239c7d7118c2a3a76e364924626/core/src/main/java/jenkins/install/InstallUtil.java#L133-L202
After a deployment the install state is NEW. I'm having a bit of trouble navigating why that is or what the best way to move out of NEW is.
I can get us out of NEW
by executing
Jenkins.getInstance().getSetupWizard().completeSetup()
but I'm not sure why it's not happening as a result of the flag being set...
@IvanVN, @gavanderhoorn, and @RonaldEnsing although I was never able to produce an inability to configure jobs on my testfarms. I have seen that the Wizard wasn't being suppressed when accessing new instances via the web UI. https://github.com/ros-infrastructure/buildfarm_deployment/pull/214 addresses that latter issue. If you could check whether it also resolves your issues with configuration that would be great.
I just launched a new test deploy from scratch, and the password issue that prevented the generate_all_jobs.py
script from running is not happening any more - I do have no idea what might have changed.
When I browse into the Jenkins UI I get the Wizard screen, which I can skip.
When I browse into the Jenkins UI I get the Wizard screen, which I can skip.
Sorry I need to correct my previous comment.
The problem seems to be related to the Jenkins version as specified in the master.yaml file.
When forking from the main repository where the Jenkins version is not specified, version 2.164.2 gets installed, and then I still get the error of the password not being injected.
I have tried to manually specify version 2.138.3, and then it overcomes that problem.
@nuclearsandwich can you check which Jenkins version were you using in your previous test?
I have tried to manually specify version 2.138.3, and then it overcomes that problem.
@nuclearsandwich can you check which Jenkins version were you using in your previous test?
My latest tests have all been with 2.164.2
I was deploying a new farm with a different config than my usual test deployment config (which is based on the buildfarm_deployment_config master branch) and I am seeing a potentially related issue. Instead of using the credentials configured via puppet. Jenkins is setting it's randomly generated default admin password.
This issue may still be outstanding especially if you're working from an earlier config being ported forward.
I have run further tests regarding this issue.
The password injection works fine up to Jenkins version 2.138.3
.
For newer versions, after running Puppet, there are two different Jenkins admin users created:
ls /var/lib/jenkins/users/
admin admin_5200768367454072463 users.xml
The password specified in the buildfarm yaml configuration is well injected into the admin
user:
cat /var/lib/jenkins/users/admin/config.xml | grep password
<passwordHash>#jbcrypt:$2a$10$vmmqzRmcDHj1t9Ajgq5edekPD8cbpD./pBSGcYzia.OsIroOKjghm</passwordHash>
The other admin
user has the password that corresponds to the one found in /var/lib/jenkins/secrets/initialAdminPassword
:
cat /var/lib/jenkins/users/admin_5200768367454072463/config.xml | grep password
<passwordHash>#jbcrypt:$2a$10$tQURLPyepNOlsO8ZHmMDYeenk/brjXGoIKEwci3vPaKqOpgStgVry</passwordHash>
But the users.xml
maps the user admin to the admin_5200768367454072463
folder. In Jenkins 2.138.3
that users.xml
file does not exist.
I have manually changed users.xml
and make it point to the admin
folder, and after a Jenkins restart, my Jenkins admin's password matches the one specified in the buildfarm yaml configuration. Just deleting the users.xml
also seems to have the same effect.
I added a simple workaround in a branch. Even if it works, I would not merge it, as I do not think it is a clean solution.
Further looking into the Jenkin's documentation, I found the following issue:
https://jenkins.io/doc/upgrade-guide/2.138/#SECURITY-1072
As stated in there, the user's record has changed, so the cleaner solution is to update the current format.
We are currently adapting our buildfarm to the JEP-200 changes. However, it seems that the deployment process fails to inject into Jenkins the admin password configured in the
master.yml
file.After successfully running the
reconfigure.bash
script, Jenkins still shows the "Unlock Jenkins" screen, preventing thegenerate_all_jobs.py
script to connect to Jenkins and configure it.If we manually unlock Jenkins and set an admin password in the UI, the
generate_all_jobs.py
script runs successfully afterwards.The
/etc/default/jenkins
configuration file in the master machine seems to have the flag properly set:So we don't know why the unlock screen still appears.
We are using an script that automatically creates and deploys the buildfarm, so it is critical for us to avoid the need of introducing manually any stuff in the UI. In the pre-jep-200 setup it was working flawlessly.
Has anyone suffered the same issue? Any clue of why the password is not properly injected?
Thank you.