moby / libnetwork

networking for containers
Apache License 2.0
2.16k stars 879 forks source link

Error initializing network controller: list bridge addresses failed: no available network #2638

Closed jhaprins closed 5 months ago

jhaprins commented 3 years ago

When one of my colleagues asked me if the network had changed because his Docker configuration was suddenly giving a lot of problems, at first I did not know what he was talking about but after some questions, it slowly became clear to me that he had problems starting his docker environment when his VPN connection to the office was online. I looked at the error message that he received and I saw the following: "Error initializing network controller: list bridge addresses failed: no available network". This was very strange because the network he had configured in his daemon.yaml looked like this: { "default-address-pools": [ {"base":"10.10.0.0/16","size":24} ] }

In our corporate network we have a lot of RFC1918 networks, a few in the 10.x.x.x/8 range, a lot in the 172.16.0.0/12 and 192.168.0.0/16 ranges. But nothing that collides with above ranges, and even if something would collide, it was all local on his workstation where he was developing and testing some monitoring systems, and he is completely free to use whatever network he wants to use locally, as long as he doesn't interfere with the corporate network. On the VPN router I have a default set of routes set for RFC1918 networks pointing towards the corporate routers, so everyone can reach the internal corporate networks without having to worry about anything. The firewalls will take care of the rest.

I started debugging the error message and did some Google searches and I found a lot of people complaining about exactly this same problem. Some example tickets: https://github.com/docker/for-linux/issues/123 https://github.com/moby/moby/issues/35121 https://github.com/moby/moby/issues/33925 Most of these tickets are against other projects, and none give a solution.

At first the error didn't make any sense to me because:

But then I thought about something. What if the docker code, searching for free networks, takes the local routing table and checks the configured network against EVERY route in the routing table. If something matches or overlaps the route in the routing table it gives this error. At first I thought that this couldn't be true because this would always fail because a default route of 0.0.0.0/0 would always match. But what if this default route is filtered out in the code for this specific reason. Then this hypothesis could be the truth.

I started testing locally on my own system, first I reproduced the error:

Setup my docker daemon with the same configuration Had my normal local routing table without VPN. Started docker and this worked fine. The resulting routing table: default via 192.168.178.1 dev enp62s0u1u1 proto static metric 1024 10.10.0.0/24 dev docker0 proto kernel scope link src 10.10.0.1 linkdown 192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

Then I started my VPN. The result was 3 extra routes: 10.0.0.0/8 via 192.168.2.1 dev tap0 proto static metric 50 172.16.0.0/12 via 192.168.2.1 dev tap0 proto static metric 50 192.168.0.0/16 via 192.168.2.1 dev tap0 proto static metric 50

I then stopped my docker daemon and tried to start it again, and indeed I received the same error. So I could reproduce the problem, now for my hypothesis: "Does the code check EVERY route in the routing table, filtering out the default route."

To test this I did the following: I removed the default route and replaced it by 2 more specific routes that are together the whole internet: 0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1 128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1

My routing table then looks like this: 0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1 128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1 192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

The only difference between this state and a clean state of my system, is not having a default route, but having two routes that are together the default route of my system. Now I tried to start the docker daemon again. If the daemon starts fine my hypothesis is wrong and I have to continue my search. If the daemon fails then my hypothesis must me correct because the default route is the only difference in my local configuration.

And indeed, I received the same error again. Now I'm sure there is absolutely no reason to give this error because:

This also proves my hypothesis that every route in the routing table is being checked against the configured network, filtering out the default route. If any route matches the configured network, the configuration is rejected.

This is a bug in the docker code. The code should be changed to only match routes with "scope link" because these routes are directly connected and would be a problem when you start a docker daemon with an overlapping network configuration. Any route that is not "scope link" should be ignored because those routes could be:

There is one corner case where you could give a warning or maybe an error. This is when there is an equal or more specific route that is not "scope link". Because this could result in routing issues to other systems. But even then, I would make it configurable because it could very well be that this is intentional and the user should be qualified to evaluate if this route overlap is a problem for him.

I'm not a developer but a network and systems engineer, so I am not able at the moment to provide a patch for this problem, but one of my colleagues thought that he had already found the problematic code in https://github.com/moby/libnetwork/blob/master/netutils/utils_linux.go in the CheckRouteOverlaps function and he might have a fix for this issue in the near future.

The problem might very well be the same in FreeBSD and / or Windows, because I also saws tickets where people had the same problem on at least Apple notebooks.

The version I have tested this with is: Docker version 19.03.13, build 4484c46d9d

Cheers, Jan Hugo Prins

akerouanton commented 5 months ago

This route overlap check was changed by https://github.com/moby/moby/pull/42598 (released in v23.0) to only consider on-link routes. There's some agreement amongst maintainers that this heuristic isn't perfect and we might revisit it, or the ability for users to influence it at a later time.

I'm going to close this ticket as 'fixed'. Thanks for reporting here and in moby/moby.