Open Banshee1221 opened 3 years ago
Hi @Banshee1221. The Gathering facts
task completed successfully, so Ansible was able to SSH to the seed. It is later on when sudo is required that it fails, with Missing sudo password
.
Can you ssh to centos@192.168.33.5?
Hi @markgoddard, yes I seem to be able to connect. It does, however, take around 3 minutes to get to prompt.
Often long timeouts with SSH indicates a problem with DNS resolution inside the server you are connecting to. Can you check that?
I had a look at the DNS and it wouldn't appear to be an issue there.. I can eventually get to the prompt, though. The timeout extension does help me get past that anyway. Any other suggestions there?
Hi @Banshee1221 the pattern of taking ages to connect, but eventually getting there really does sound like you have UseDNS yes
in /etc/ssh/sshd_config
on the seed VM 192.168.33.5 and DNS is not working on the seed VM (the UseDNS setting is actually the default in CentOS 7).
If it is not that we'll need to think of other ways the machine is working, but extremely slowly. Can you check if hardware virtualisation support is enabled in your bare metal host (eg, use lscpu
). I doubt that would make it as slow as this though...
Hi @oneswig thanks for the response. I had a brain-fart and didn't think to actually check the seed vm. I only verified that the DNS settings on the host were correct (not sshd).
I went ahead and looked through the Ansible and found that the seed vm root volume is at /var/lib/libvirt/images/seed-root
, so I used guestfish
to modify the sshd_config and explicitly turn off UseDNS
there. It does make the login much faster, but I still have the same issue with:
TASK [singleplatform-eng.users : Per-user group creation] *********************************************************************************************************************************************************
fatal: [seed]: FAILED! => {"msg": "Missing sudo password"}
PLAY RECAP ********************************************************************************************************************************************************************************************************
seed : ok=6 changed=0 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0
I've also verified that the vmx flag is on for hardware virtualisation and VT-x is set in BIOS.
If you SSH to the seed ssh centos@192.168.33.5
and try running any command with sudo
, does it work without a password?
No, I'm prompted for a password. I assumed that this was normal.
This is the cause of your issues. Kayobe is trying to use sudo
without a password when logged in as centos
.
Normally the centos
user should have been given passwordless sudo privileges. Do you have a file called /etc/sudoers.d/90-cloud-init-users
on the seed VM?
There is no file called /etc/sudoers.d/90-cloud-init-users
on the seed VM. Is this supposed to be pulled in from the host? Because as mentioned this is a bare-metal machine with a stock CentOS image installed so I personally just added NOPASSWD
to the root /etc/sudoers
file.
Do the Ansible scripts copy /etc/sudoers.d/90-cloud-init-users
from the host to the VM?
EDIT: Manually setting centos ALL=(ALL) NOPASSWD:ALL
in the seed VM /etc/sudoers
has, as expected, fixed the issue. I just want to know what is required for this to be automated?
I wonder if there's some cloud-init step that's not running correctly?
So this may be off topic and please let me know if I should open a new ticket around this, but the reason I'm posting it here is because it's possible that it's all related to the seed VM.
After manual adding the centos user NOPASSWD
to the VM, I'm able to continue until this step:
TASK [Ensure kayobe virtualenv has the latest version of pip installed] *******************************************************************************************************************************************
fatal: [seed]: FAILED! => {"changed": false, "cmd": ["/opt/kayobe/venvs/kayobe/bin/pip2", "install", "-U", "pip"], "msg": "stdout: Could not fetch URL https://pypi.python.org/simple/pip/: There was a problem confirming the ssl certificate: [SSL] unknown error (_ssl.c:618) - skipping\nRequirement already up-to-date: pip in /opt/kayobe/venvs/kayobe/lib/python2.7/site-packages\n\n:stderr: fips.c(145): OpenSSL internal error, assertion failed: FATAL FIPS SELFTEST FAILURE\n"}
This kind of leads me to believe that perhaps there's some issue with the creation of the seed VM? This could also be a totally unrelated bug, though, and I haven't gotten past it yet. Perhaps I'm missing something here, but I haven't had the time to dig through all of the Ansible.
EDIT: This doesn't seem to be network related, as I can reach https://pypi.python.org via curl
just fine.
It does sound like something has gone wrong during the creation of the VM. I would expect cloud-init to create the /etc/sudoers.d/90-cloud-init-users
file but it appears not to have done. Is there anything in the cloud-init logs to suggest what might have gone wrong? Also, do you know why DNS was not working in the seed VM?
Hi @markgoddard. Sorry for the delay in response. I poked around and found this:
2020-10-07 08:19:33,496 - __init__.py[DEBUG]: Adding user centos
2020-10-07 08:19:33,498 - util.py[DEBUG]: Running hidden command to protect sensitive input/output logstring: ['useradd', 'centos', '--comment', 'Cloud User', '--groups', 'adm,systemd-journal,wheel', '--shell', '/bin/bash', '-m']
2020-10-07 08:19:34,666 - util.py[DEBUG]: Running command ['passwd', '-l', 'centos'] with allowed return codes [0] (shell=False, capture=True)
2020-10-07 08:19:35,198 - util.py[WARNING]: Failed to disable password for user centos
2020-10-07 08:19:35,205 - util.py[DEBUG]: Failed to disable password for user centos
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/distros/__init__.py", line 584, in lock_passwd
util.subp(['passwd', '-l', name])
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 2091, in subp
cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['passwd', '-l', 'centos']
Exit code: 255
Reason: -
Stdout: Locking password for user centos.
passwd: Error (password not set?)
Stderr: passwd: Libuser error at line: 124 - data not found in file.
2020-10-07 08:19:35,250 - handlers.py[DEBUG]: finish: init-network/config-users-groups: FAIL: running config-users-groups with frequency once-per-instance
2020-10-07 08:19:35,252 - util.py[WARNING]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) failed
2020-10-07 08:19:35,254 - util.py[DEBUG]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) failed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 813, in _run_modules
freq=freq)
File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.py", line 165, in handle
cloud.distro.create_user(user, **config)
File "/usr/lib/python2.7/site-packages/cloudinit/distros/__init__.py", line 537, in create_user
self.lock_passwd(name)
File "/usr/lib/python2.7/site-packages/cloudinit/distros/__init__.py", line 587, in lock_passwd
raise e
ProcessExecutionError: Unexpected error while running command.
Command: ['passwd', '-l', 'centos']
Exit code: 255
Reason: -
Stdout: Locking password for user centos.
passwd: Error (password not set?)
Stderr: passwd: Libuser error at line: 124 - data not found in file.
As for DNS, I'm not 100% sure yet.
Hello,
This seems to apply to stable/train and stable/rocky.
I have a bare-metal server that I'm trying to run UFN on. I have a stock CentOS 7.8 install and have created a
centos
user withNOPASSWD
set for sudo access and thecentos
user does have a password set.Everything seems to run fine up to the point where the seed VM is accessed in
./dev/seed-deploy.sh
. I originally struggled with "Timeout (12s) waiting for privilege escalation prompt:", but after adjusting the timeout I'm now faced with "Incorrect sudo password." I can't access the VM viassh
from thecentos
user either. I've also tried this withANSIBLE_BECOME_ASK_PASS=1
.Any ideas?
Here's a dump of that last action with
ANSIBLE_DEBUG=1
: