Open ebarriosjr opened 1 month ago
Hi @ebarriosjr! Nomad doesn't use the Docker CLI. From the package version number you've got there, I'm assuming you're using a downstream distribution and not Docker's own package? If I look at https://github.com/docker/cli/compare/v27.0.2...v27.0.3 I see that they vendored the main moby/moby project at v27.0.3. And then if I look at the release notes for v27.0.3 I see some interesting suspects. So my guess is that dockerd
itself was also upgraded by your package update? Before we go digging further, can you confirm that by providing the output of docker version
?
For what it's worth, I've upgraded my local environment to 27.0.3 and tested out a Nomad job with networking and wasn't able to reproduce any problems. Maybe there's something specific to your client configuration or job that you could share?
The other weird item here is this error Constraint "missing network": 1 nodes excluded by filter
that you reported, because that suggests that there's something wrong with host fingerprinting of the network. And that doesn't involve Docker at all.
Yesterday, after building a new nomad client, I've found that the connect envoy side-car ports are not being published correctly. Nothing has changed in the setup except newer packages have been installed.
From what I can see, the other clients were running 26.X of docker-ce and the new one is running 27.X. The other clients had packages updates (mostly kernel and docker to 27.X and they've also started failing in the same way).
Happy to supply any info - from what I can see iptables has the entries for the allocations/ports, but getting connection refused.
The client was running 1.7.7, but have upgraded to 1.8.1, but still seeing the same issue.
I'm going to try and downgrade docker to see if it helps and will get back
Matt
Any chance you upgraded the host distro at the same time? There's an open issue around the bridge
module having been baked-in rather than a DKM https://github.com/hashicorp/nomad/issues/23583 and that's hitting a known issue in our network fingerprinting. (Which previously only impacted niche OS distros.)
Hi @tgross, the output of my docker version
command is:
Client: Docker Engine - Community
Version: 27.0.2
API version: 1.46
Go version: go1.21.11
Git commit: 912c1dd
Built: Wed Jun 26 18:48:01 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:44 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0```
Weird that your client and server don't match. But the server looks identical to what I've posted above. Any thoughts about the networking discussion above?
Thats because i reverted the version of docker-ce-cli to 27.0.2. On 27.0.3 all the jobs that i have running on nomad stop working with the missing network error.
Any chance you upgraded the host distro at the same time? There's an open issue around the
bridge
module having been baked-in rather than a DKM #23583 and that's hitting a known issue in our network fingerprinting. (Which previously only impacted niche OS distros.)
Assuming this was aimed at me.. I'm running Debian bookworm, which definitely hasn't changed. As I say, it could be something completely unrelated, but a port-forwarding issue would presumably be a nomad client-related issue (as opposed to nomad servers, consul etc. related) and all the clients did so after they were rebooted and the only thing that had changed were package updates (plus a re-install, which included the latest docker version).
I'm just following up on the downgrade to see if it helped :)
Matt
Edit: No, the downgrade didn't help - so probably completely unrelated. Apologies, I'll continue my investigation
Edit edit: Yes, please completely ignore me - mine was actually the connect PKI root CA expiring (but happened during a powerdown, so the affect was quite different - envoy would start "happily" without any errors/warnings, but just didn't listen on any of the service ports!)
Ok, thanks @MatthewJohn. So @ebarriosjr that leaves the networking, as I mentioned earlier:
The other weird item here is this error Constraint "missing network": 1 nodes excluded by filter that you reported, because that suggests that there's something wrong with host fingerprinting of the network. And that doesn't involve Docker at all.
https://github.com/hashicorp/nomad/issues/23583 suggests that something may have changed in the environment where the bridge kernel module is unavailable, but I'd expect to see a network still. For us to make further progress on this we'll need information from you on the network fingerprint (and/or client logs from the network fingerprinting), whether the distro has been updated, whether the kernel module is present, etc.
Nomad version
Nomad v1.8.1 BuildDate 2024-06-19T06:43:57Z Revision 5022543e4b7b8dcec9df123f86630ae3fdcffbe6
Operating system and Environment details
lsb_release -a
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammyIssue
After upgrading docker-ce-cli from 5:27.0.2 to 5:27.0.3 nomad breaks. No containers were deployed. Some of them had the issue:
Constraint "missing network": 1 nodes excluded by filter
, others were trying to use ipv6 instead of ipv4.Reproduction steps
Update docker-ce-cli to version 5:27.0.3 and reboot.
Expected Result
Nomad would be able to spawn docker container without issue.
Actual Result
No container could be started