Changing client network_interface prevents further scheduling.

Nomon commented 7 years ago

Nomad version

0.6.0-dev (latest master), also happens on 0.5.x

Operating system and Environment details

ubuntu xenial & trusty

Issue

Changing client network_interface on a node prevents any further allocations unless older ones are drained and pruned or client started with empty state. Ran into this when debugging issues with vagrant+virtualbox+docker and nomad starting existing allocs to old network and refusing any allocs with the new eth1 as network_interface.

EvalContext adds existing allocs to proposed allocs and that makes the netIdx.Overcommitted() return true, even though all the new proposed allocs in plan have a network resource with the new network_interface.

As an example 1 old 1mbits alloc running on eth0, nomad network interface changed to eth1 and restarted, new allocs fail due to "bandwidth exceeded" due to the used map = {"eth0":1}; avail map = {"eth1":1000} and Overcommitted iterating over the used and considering available["eth0"] go int zero value a network device with 0mbit available.

I think the Overcommitted should have a device parameter or it should have a separate function and AllocsFit should use the networks of the new allocs considered to limit the Overcommitted check to them. I was going to pr it but decided to make this issue instead as there are some alternatives to consider that might suite better depending on where the networking is going in the future and what is considered correct here:

evict allocs to un-overcommit the networks overcommitted
change overcommitted to iterate over available map instead of used bandwidth map devices, thus changing also the reserved logic so that a reservation on a non-existing network device no longer makes nodes network overcommitted as currently any new allocs target the single configured network_interface.
make overcommit check if the available bandwidth map actually has the key and skip non-existing devices instead of using the go int zero value of 0 from the unchecked map read.
- iterate over available devices instead of used when checking overcommited, make reserving from non-existing network resources an error to reduce the additional error surface from invalid/future reservation being ignored due to iterating over available and not the used devices).

Reproduction steps

start nomad agent with client & server, schedule a docker container using networks, change client config network_interface to a new interface, restart nomad and schedule a new job that needs network resources.

ynohat commented 5 years ago

This bit me hard today. My exact issue was that I had switched the client allocation interface from eth0 and eth1. Thank you very much @Nomon for providing a workaround, I would never have figured this out.

tgross commented 4 years ago

@nickethier I found this issue while searching for something unrelated and it looks like this the only remaining consumer of the Overcommitted() check discussed here is in AllocsFit. Given the deprecation of network mbits, is this now safe to remove entirely?

hashicorp / nomad