Closed pguerin3 closed 2 years ago
I don't know if this is relevant to the problem, but the link below says this:
Fedora uses the libvirt family of tools as its virtualization solution.
https://docs.fedoraproject.org/en-US/quick-docs/getting-started-with-virtualization/index.html
And I'm not using libvirt.....
Seems to be not related to that. As already said on the other thread (issue #413), you can configure with one is the default provider. That said, you can try to reproduce the same issue with a simpler VM (just Oracle Linux for example) and see if it works. In any case you should check the log details, as specified for example on the output above:
master1: See /var/tmp/cmd_lUL0V.log for details
When I try to inspect /var/tmp/ there are no cmd* files present, so I can't inspect the log. Seems those are cleared out immediately.
Trying the other projects: This works for ~/vagrant-projects/OracleLinux/8
vagrant up --provider=virtualbox
This also works for the same project
EXTEND=container-tools vagrant up --provider=virtualbox
So this proves that the Vagrant/Virtualbox combination works.
Back to ~/vagrant-projects/OLCNE, there is no obvious problem with the key management for the build of worker1:
==> worker1: Waiting for machine to boot. This may take a few minutes...
worker1: SSH address: 127.0.0.1:2222
worker1: SSH username: vagrant
worker1: SSH auth method: private key
worker1:
worker1: Vagrant insecure key detected. Vagrant will automatically replace
worker1: this with a newly generated keypair for better security.
worker1:
worker1: Inserting generated public key within guest...
worker1: Removing insecure key from the guest if it's present...
worker1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> worker1: Machine booted and ready!
Also no problem with the build of master1:
==> master1: Waiting for machine to boot. This may take a few minutes...
master1: SSH address: 127.0.0.1:2200
master1: SSH username: vagrant
master1: SSH auth method: private key
master1:
master1: Vagrant insecure key detected. Vagrant will automatically replace
master1: this with a newly generated keypair for better security.
master1:
master1: Inserting generated public key within guest...
master1: Removing insecure key from the guest if it's present...
master1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> master1: Machine booted and ready!
Even after a reboot, the following error is still appearing:
master1: ssh 192.168.56.101 /etc/olcne/bootstrap-olcne.sh --secret-manager-type file --olcne-node-cert-path /etc/olcne/pki/production/node.cert --olcne-ca-path /etc/olcne/pki/production/ca.cert --olcne-node-key-path /etc/olcne/pki/production/node.key --olcne-component agent
master1: Returned a non-zero code: 255
master1: Last output lines:
master1: Host key verification failed.
master1: See /var/tmp/cmd_R0VLC.log for details
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
The IPs appearing in the above logs are not the ones defined this repository.
To rule out any issue with the changes you made, can you revert to the original Vagrantfile
/ scripts of this project, create a /etc/vbox/networks.conf
file with:
* 192.168.0.0/16
* fe80::/64
and retry?
The Host key verification failed
error happens in a guest-to-guest communication, this should not interact with the host (other than using the vboxnet
bridge defined in VirtualBox), the deployment should be exactly the same, whatever the host OS is.
Note that:
vagrant up
, and not vagrant up worker1; vagrant up master1
)vagrant destroy; vagrant up
)SSH keys for node-to-node communication are generated by the first node coming online and removed by the operator/master node; a re-run will generate a new pair and keys won't match...
The restriction on the IP range for hostonly networks has been introduced in VirtualBox 6.1.28, we still need update this project (either by documenting the use of /etc/vbox/networks.conf
or changing the IP range used).
I'm using the information specified here: https://www.virtualbox.org/manual/UserManual.html#network_hostonly
So to allow everything, I'm specifying 0.0.0.0/0.
> cat /etc/vbox/networks.conf
* 0.0.0.0/0 ::/0
However I will try your suggestions soon.
Allowing everything is fine as well. My point is that you aren't using an exact copy of this project and I can't reproduce your issue.
After doing a git fetch to remove all the changes to the Vagrant file, and rerunning I've decided that the root cause is that I don't have enough physical memory on my laptop to allow Vagrant to finish without error.
The bare minimum is one master and one worker node with 3GB of memory each, so you need to be able to run two VirtualBox VMs for a total of 6GB free memory for the VMs (no over-commit!)...
If you don't run ISTIO, you could decrease the memory allocation, but the VMs tend to be less responsive and you might experience install failures due to timeouts.
The OLCNE Vagrant project should run a laptop with 16GB of RAM (providing you don't have other major workloads). Anything less than that might be difficult.
Worth noticing, you should configure an Oracle Container Registry mirror in your region; using the default one will likely cause timeout issues at install time.
I have Oracle Virtualbox installed (not libvirt). To start an OLCNE in Vagrant, I go to the ~/vagrant-projects/OLCNE directory, and 'vagrant up --provider=virtualbox' The worker VM is created then during the creation of the master VM, the error appears:
Sounds like something to do with the config of SSH, so I have set the SSH config file to lessen the checking:
Unfortunately this doesn't help, and I still get the same build error.
Environment
Host OS: Fedora 35 Kernel version (for Linux host): Linux 5.15.11-200.fc35.x86_64 Vagrant version: 2.2.16 Vagrant provider: 6.1.32 r149290 Vagrant project: ~/vagrant-projects/OLCNE
Additional information
The tail of the build log is here: