Open gunamata opened 2 years ago
The problem here is that the underlying container engine (CNI) checks any newly created network route against the existing routes on the system. If a route rule with an IP address from a conflicting subnet on the Iptables exists it will yeild to this error. The conflicting routes could be either from the host network (bridge mode) or Kube network in this case. A long-term workaround would be, we can either detect conflicting addresses and change the available network pools to the container engine accordingly. Or, as a short term solution we can document on how to manually change the network pool address.
I have not been able to repro this because #2934 is blocking me from getting to a working system.
Doing a Factory Reset allowed me to go past #2934, but I still cannot repro this.
I got an error once:
e:\home\jan>nerdctl run -d -p 85:80 --restart=always nginx
FATA[0002] OCI runtime start failed: cannot start a container that has stopped: unknown
But that was maybe while containerd was still starting up.
Afterwards I could run the command repeatedly without getting any error. I'm somewhat surprised though that nerdctl
didn't tell me that the port was already in use:
e:\home\jan>nerdctl ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0bb0a5a0de03 docker.io/library/nginx:latest "/docker-entrypoint.…" 8 minutes ago Up 0.0.0.0:85->80/tcp nginx-0bb0a
0fb32c58e483 docker.io/library/nginx:latest "/docker-entrypoint.…" 13 seconds ago Up 0.0.0.0:85->80/tcp nginx-0fb32
3d66a24f31ce docker.io/library/nginx:latest "/docker-entrypoint.…" 10 seconds ago Up 0.0.0.0:85->80/tcp nginx-3d66a
5449641543de docker.io/library/nginx:latest "/docker-entrypoint.…" About a minute ago Up 0.0.0.0:85->80/tcp nginx-54496
bc5eb13c4ab8 docker.io/library/nginx:latest "/docker-entrypoint.…" 3 minutes ago Up 0.0.0.0:85->80/tcp nginx-bc5eb
d33650d61a92 docker.io/library/nginx:latest "/docker-entrypoint.…" 7 seconds ago Up 0.0.0.0:85->80/tcp nginx-d3365
FWIW, 10.4.0.0/24
is the network created by nerdctl itself:
16: nerdctl0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether b2:76:c0:57:f0:ea brd ff:ff:ff:ff:ff:ff
inet 10.4.0.1/24 brd 10.4.0.255 scope global nerdctl0
So any reported conflict would be with the network already set up by the previous container starting while setting up a new one at the same time.
I forgot the "Reset Kubernetes with Images" step. After I've done this, I get the error too:
e:\home\jan>nerdctl run -d -p 85:80 --restart=always nginx
docker.io/library/nginx:latest: resolved |++++++++++++++++++++++++++++++++++++++|
index-sha256:0b970013351304af46f322da1263516b188318682b2ab1091862497591189ff1: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:79c77eb7ca32f9a117ef91bc6ac486014e0d0e75f2f06683ba24dc298f9f4dd4: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:2d389e545974d4a93ebdef09b650753a55f72d1ab4518d17a30c0e1b3e297444: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:600c24b8ba3900f029e02f62ad9d14a04880ffdf7b8c57bfc74d569477002d67: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:31b3f1ad4ce1f369084d0f959813c51df0ca17d9877d5ee88c2db6ff88341430: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:fd42b079d0f818ce0687ee4290715b1b4843a1d5e6ebe7e3144c55ed11a215ca: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:30585fbbebc6bc3f81cb80830fe83b04613cda93ea449bb3465a08bdec8e2e43: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:18f4ffdd25f46fa28f496efb7949b137549b35cb441fb671c1f7fa4e081fd925: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:9dc932c8fba266219fd16728c9e3f632296d043407e77d6af626c5119f021b42: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 15.7s total: 30.0 M (1.9 MiB/s)
FATA[0017] subnet 10.4.0.0/24 overlaps with other one on this address space
I could repro this on 1.5.1 (latest release at this time) too. Here are the steps: (Same steps as I mentioned in the initial issue description, just that I captured some additional info about the Kubernetes versions I used)
nerdctl run -d -p 85:80 --restart=always nginx
nerdctl run -d -p 85:80 --restart=always nginx
Out of curiosity I tried this on macOS as well, and I couldn't repro it there.
I didn't really expect it anyways; earlier discussion with @mook-as produced the theory that the problem comes because "deleting" the VM on WSL does not really restart WSL, and since networking is shared between distros, it is possible that the old network definitions were not cleaned up properly.
This has nothing to do with k8s and downgrading I can repro with this simplified steps:
nerdctl run -d -p 85:80 --restart=always nginx
nerdctl run -d -p 85:80 --restart=always nginx
So it seems indeed like the nerdctl0
network is lingering even though the rancher-desktop
distro got deleted and recreated.
Since it is not a regression, I think this could be moved to the "Later" milestone.
[^1]: It doesn't really make sense to call this "Reset Kubernetes" while Kubernetes is disabled.
Doing Factory Reset
or restarting the machine resolved this issue for me on Windows 10 Enterprise.. Just sharing if it helps with the investigation of the problem..
Doing
Factory Reset
or restarting the machine resolved this issue for me on Windows 10 Enterprise..
I would think that anything that shuts down the WSL VM (and not just the individual distro) would fix it because I don't see how a network definition would survive the restart.
So I think wsl --shutdown
would fix the problem, but it is rather heavy-handed, as it will stop all other distros as well. At the very least we would need an extra warning/confirmation from the user.
@gunamata to provide material to update the FAQ around this. We should test a WSL shutdown around this too.
So I think
wsl --shutdown
would fix the problem, but it is rather heavy-handed, as it will stop all other distros as well. At the very least we would need an extra warning/confirmation from the user.
If we are taking the short-term approach, we should be able to change the default network pool (/etc/cni/net.d
if exist, if not create one) available to containerd instead of shutting down WSL. This would be very similar to docker's default-address-pool
.
e.g.
"default-address-pools": [
{
"base": "10.17.0.1/16",
"size": 16
}
]
This example might be useful: https://github.com/containerd/containerd/blob/main/script/setup/install-cni
@gunamata this (custom networks) should be sufficient for documentation purposes. Although I have not tested it with our version of nerdctl
.
Although I have not tested it with our version of
nerdctl
.
Please test before adding to docs! We should be sure it actually works. 😺
Based on a comment in #3365, it looks like nerdctl introduced this in https://github.com/containerd/nerdctl/pull/1245.
Actual Behavior
Ran into below error after running a container.
FATA[0005] subnet 10.4.0.0/24 overlaps with other one on this address space
I observed this behavior with the CI build: https://github.com/rancher-sandbox/rancher-desktop/actions/runs/3071306322
Steps to Reproduce
nerdctl run -d -p 85:80 --restart=always nginx
nerdctl run -d -p 85:80 --restart=always nginx
Result
Ran into below error after running a container.
FATA[0005] subnet 10.4.0.0/24 overlaps with other one on this address space
Expected Behavior
The container should run with out errors
Additional Information
No response
Rancher Desktop Version
https://github.com/rancher-sandbox/rancher-desktop/actions/runs/3071306322
Rancher Desktop K8s Version
1.21.4
Which container engine are you using?
containerd (nerdctl)
What operating system are you using?
Windows
Operating System / Build Version
Windows 10 Enterprise
What CPU architecture are you using?
x64
Linux only: what package format did you use to install Rancher Desktop?
No response
Windows User Only
No response