Open StPanning opened 1 year ago
https://bugzilla.redhat.com/show_bug.cgi?id=1821277 suggests that the message from libvirtd is not necessarily an issue, it depends on whether there was an operation that was outstanding and the client closed early.
I don't have anything to suggest that would be an obvious fix. Try adding options around ssh keep alive would be one possibility.
Experimenting with setting the nic_model_type
might be a good option here, I've found some time ago that some network emulation worked better than others and it varied across distro and kernels. Usually virtio
is good, but it might be no harm to try some of the others. https://vagrant-libvirt.github.io/vagrant-libvirt/configuration.html#domain-specific-options
You could also try modifying management_network_mtu
for the management network and libvirt__mtu
for the private network to values less than 1500, in case you are finding that you are bumping into some issue where occasionally sshd has an issue with the message coming through. I've seen issues with ssh hangs with mtu set at 1500, but usually only for physical networks when you discover some device in the network path is not being careful with fragmenting packages. I wouldn't expect it to be an issue for connection to a VM unless there is a driver bug in there as well.
The fact that the sshd instance in guest stops accepting the ssh key suggests it might be a guest side issue. Any chance that with you have something running in the guess that keeps consuming memory and after a little while it consumes all the memory? Results in the sshd hang until the OOM in the kernel causes sshd to be restarted which would allow connecting after a little while. But it may not have enough memory to then read the authorized keys for the user?
I'd try the following, at the very least if you can capture a log from the guess side of what is happening, there is a better chance of debugging.
It might be worth connecting to the guest console virsh console <vm>
depending on whether kernel messages are sent to the console or already appear via the VNC terminal.
I was able to resolve this.
The error was in my custom built ubuntu box.
I packed the box image without removing /etc/machine-id
Because of this every box derived from this box gets the same ip-address assigned in the management-network, even if the mac-addresses of the management interfaces are different.
Why is this the case? This is really hard to debug.
Deleting /etc/machine-id
is only the first step, because then the management network interface doesn't get any ip-address assigned.
My solution to this problem:
Before you pack your custom image
login the vm that you want to pack
remove /etc/machine-id
create a script, that creates the machine-id if it not already exists
cat <<EOF>/usr/local/bin/init_machine_id.sh
#!/bin/bash
if [ -e /etc/machine-id ]
then
exit 0
fi
/usr/bin/systemd-machine-id-setup
EOF
chmod +x /usr/local/bin/init_machine_id.sh
create a service that calls the script, before the network.target is started
cat <<EOF>/etc/systemd/system/init_machine_id.service
[Unit]
Description=Initialize Machine-ID
Before=network-pre.target
Wants=network-pre.target
[Service]
Type=simple
ExecStart=/usr/local/bin/init_machine_id.sh
Restart=on-failure
RestartSec=10
KillMode=process
[Install]
WantedBy=network.target
EOF
chmod +x /etc/systemd/system/init_machine_id.service
Is there a more straight forward solution?
@StPanning, for Ubuntu, I'm using a simpler solution at https://github.com/rgl/ubuntu-vagrant/blob/2e72a6de546b2056df330f5f441c546aaced1ab2/provision.sh#L90-L97
Hello, I have network problems with vagrant/libvirt.
after
vagrant up
I canvagrant ssh
into my boxes. However, after some minutes I getvagrant@vagrant:/vagrant/ansible$ client_loop: send disconnect: Broken pipe
After that the network becomes unresponsive for a while. Thenvagrant ssh
sort of works again, but now it is requesting a password and is not using the usual key-authentication.This happens periodically, but the boxes are still running and responsive when I check via vnc. The natted interface has the correct IP-Address assigned. I'm using the libvirt provider, this are the messages I'm seeing there:
I'm not sure if the problem is libvirt/kvm or vagrant, since
vagrant ssh
also behaves strangelyhere is my Vagrantfile:
`
Here is the config xml of the ansible vm:
Any Ideas?