Closed cquike closed 1 month ago
incus_info.txt Output of incus info attached.
I've had production clusters running Ubuntu 24.04 containers using networkd with some instances having been up for 2-3 months and never saw any IP disappear, so there's more to this than just some issue with those images.
How many containers are running on those systems?
Around 20 containers are run in a 4 nodes cluster, so on average 4-5 containers per host.
These hosts are also running (non-clustered) LXD daemons for some legacy containers. I hope that it does not cause interference between them.
Can you show iptables -L -n -v
on one such system? It's not impossible that things temporary break the INPUT table.
On very busy systems, I've also seen containers that share uid/gid map run into some issues with netlink communications which can lead to what you're seeing. So you may want to try using security.idmap.isolated=true
to get your containers to each have their own uid/gid range.
iptables does not seem to show anything interesting:
# iptables -L -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Regarding the idmap: I do have the following configuration in all the containers:
config:
raw.idmap: |-
uid 1000-10000 1000-10000
gid 1000-10000 1000-10000
in order to map the container users to the host ones. Is the security.idmap.isolated=true
configuration option compatible with that?
in order to map the container users to the host ones. Is the security.idmap.isolated=true configuration option compatible with that?
Yeah, that's fine, the isolated map will apply to the rest of the uid/gid allocation.
So before changing the configuration we performed an upgrade of incus across the cluster to the latest in Debian bookworm-backports which is 6.0.1. And voila! A week after no issues anymore with network connectivity.
So maybe the issue was fixed between 6.0.0 and 6.0.1. Feel free to close this ticket then.
Required information
Issue description
We have several long running containers and we observe that almost once a day one or more machines lose their IPv4 address. A workaround is either to stop and start the containers again or restart the
systemd-networkd
service inside the container. This problems seems to happen with Ubuntu 24.04 and Fedora 40 which both use systemd-networkd as I can tell. We also have Debian 11 and 12 machines running in the same incus cluster which seem to be unaffected.We cannot see anything in the logs that helps us to troubleshoot the issue. The only thing I see in the container log is this:
The logs of the containers as shown by
incus info --show-log
are empty.The cluster has 4 members and the problem happens in any of the incus hosts. The network configuration for all hosts is a bridge (incusbr0) which use all the containers.
Steps to reproduce
incus ls
Information to attach
incus config show NAME --expanded
) architecture: x86_64 config: image.description: Fedora 40 amd64 incus container image for Jenkins pipeline testing limits.cpu.allowance: 15% limits.cpu.priority: "1" limits.memory: 15% limits.memory.swap.priority: "1" raw.idmap: |- uid 1000-10000 1000-10000 gid 202 202 gid 128 128 gid 5000-30000 5000-30000 volatile.base_image: 62ee7f19175c17e5a305392ba11fec75f8a404df4b97f23347c1de80760c2f67 volatile.cloud-init.instance-id: 59192da1-c9eb-4d12-aff6-59fffe493862 volatile.eth0.host_name: vethecd5cba6 volatile.eth0.hwaddr: 00:16:3e:df:82:1f volatile.idmap.base: "0" volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":1000,"Maprange":9001},{"Isuid":true,"Isgid":false,"Hostid":110001,"Nsid":10001,"Maprange":9990000},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":128},{"Isuid":false,"Isgid":true,"Hostid":128,"Nsid":128,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":100129,"Nsid":129,"Maprange":73},{"Isuid":false,"Isgid":true,"Hostid":202,"Nsid":202,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":100203,"Nsid":203,"Maprange":4797},{"Isuid":false,"Isgid":true,"Hostid":5000,"Nsid":5000,"Maprange":25001},{"Isuid":false,"Isgid":true,"Hostid":130001,"Nsid":30001,"Maprange":9970000}]' volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":1000,"Maprange":9001},{"Isuid":true,"Isgid":false,"Hostid":110001,"Nsid":10001,"Maprange":9990000},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":128},{"Isuid":false,"Isgid":true,"Hostid":128,"Nsid":128,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":100129,"Nsid":129,"Maprange":73},{"Isuid":false,"Isgid":true,"Hostid":202,"Nsid":202,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":100203,"Nsid":203,"Maprange":4797},{"Isuid":false,"Isgid":true,"Hostid":5000,"Nsid":5000,"Maprange":25001},{"Isuid":false,"Isgid":true,"Hostid":130001,"Nsid":30001,"Maprange":9970000}]' volatile.last_state.idmap: '[]' volatile.last_state.power: RUNNING volatile.last_state.ready: "false" volatile.uuid: 7abe1a5a-503f-4fa9-a291-a4c4cf6de1b0 volatile.uuid.generation: 7abe1a5a-503f-4fa9-a291-a4c4cf6de1b0 devices: eth0: name: eth0 network: incusbr0 type: nic root: path: / pool: lvm_pool type: disk ephemeral: false profiles: