Closed radriaanse closed 8 months ago
Hi @radriaanse and thanks for raising this issue. I have reproduced this locally and spent a fair time locally investigating this but have unable to so far find a solution or the exact cause of the issue. It does, however, initially seem to be a problem with the CNI plugins repository rather than Nomad. Issue #431 seems to roughly describe a similar problem, however, there has been no responses from containernetworking members.
Testing outside of Nomad to check whether we are manipulating state we shouldn't be, I used cnitool to allocate a network to a network namespace. I firstly create a network namespace using ip netns add testnamespace
. I then wrote the following example cni config to disk which was used for all the cnitool commands:
{
"cniVersion": "0.4.0",
"name": "testnet",
"dns": {
"nameservers": [
"1.1.1.1"
]
},
"type": "bridge",
"bridge": "testnet",
"ipMasq": true,
"isDefaultGateway": true,
"forceAddress": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "172.16.3.0/24",
"rangeStart": "172.16.3.10",
"rangeEnd": "172.16.3.250",
"gateway": "172.16.3.1"
}]
]
}
}
The cnitool add command CNI_PATH=/opt/cni/bin/ cnitool add testnet /var/run/netns/testnamespace
completed successfully with the following detailed output which includes the DNS. Nomad logs this object without any manipulation, which explains why the log line suggests success.
{
"cniVersion": "0.4.0",
"interfaces": [
{
"name": "testnet",
"mac": "de:b4:cf:9a:0d:e7"
},
{
"name": "vethc75b9344",
"mac": "1a:ee:4f:81:db:60"
},
{
"name": "eth0",
"mac": "86:d4:d1:aa:f5:8d",
"sandbox": "/var/run/netns/testing"
}
],
"ips": [
{
"version": "4",
"interface": 2,
"address": "172.16.3.22/24",
"gateway": "172.16.3.1"
}
],
"routes": [
{
"dst": "0.0.0.0/0",
"gw": "172.16.3.1"
}
],
"dns": {
"nameservers": [
"1.1.1.1"
]
}
When I exec into the network namespace to check the resolv.conf
the file is not as we supplied, and is a copy of the host machine /etc/resolv.conf
:
$ ip netns exec testnamespace cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "systemd-resolve --status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0
search Home
I also tested by adding a temporary custom resolv.conf file to disk and referencing that within the ipam
configuration block as shown below. This didn't achieve any different/better results.
{
"cniVersion": "0.4.0",
"name": "testnet",
"type": "bridge",
"bridge": "testnet",
"ipMasq": true,
"isDefaultGateway": true,
"forceAddress": true,
"ipam": {
"type": "host-local",
"resolvConf": "/tmp/cni_resolv.conf",
"ranges": [
[{
"subnet": "172.16.3.0/24",
"rangeStart": "172.16.3.10",
"rangeEnd": "172.16.3.250",
"gateway": "172.16.3.1"
}]
]
}
}
Apologies I am unable to provide a workaround or propose a solution at this time. I will keep the issue open; if we have time to continue the investigation we will do so and respond with any updates.
Thanks @jrasell for looking into it, I didn't think about using something like cnitool to verify the behavior outside of Nomad. Looks like some oddness going on with the bridge plugin indeed and I mistakenly assumed it was Nomad since the output of the plugin looked just fine.
I'll also try to dig into this further and update here!
Good time of a day, colleges.
I have same issue and run some debugs and maybe find out point there dns config is missing. At this point dns config stored at ar.state.NetworkStatus.DNS (allocRunner), but can't find out any not nil reference to correct network configuration in task struct.
After all, I can't confirm ResolvConfPath file's correct creation via CNI
It seems nomad should create resolv.conf file, like it does using network->dns stanza because i failed to find any real dns configuration in cni plugins. Also i failed to find any dns configuration mechanics in cni docs
[1] https://unix.stackexchange.com/questions/443898/separate-dns-configuration-in-each-network-namespace
Can confirm a similar behavior with ipvlan
and host-local
resolvConf
option.
{
"cniVersion": "0.4.0",
"name": "vpc",
"plugins": [
{
"type": "ipvlan",
"master": "eth1",
"mode": "l3s",
"ipam": {
"type": "host-local",
"resolvConf": "/opt/cni/run/vpc-resolv.conf",
"dataDir": "/var/run/cni",
"ranges": [
[
{
"subnet": "172.16.6.96/28"
}
],
[
{
"subnet": "2a05:d014:d9e:c300:4f2:0:0:0/80"
}
]
],
"routes": [
{
"dst": "::/0"
},
{
"dst": "0.0.0.0/0"
}
]
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
},
"snat": true
},
{
"type": "firewall",
"backend": "iptables"
}
]
}
nameserver 172.16.6.97
nameserver 2a05:d014:d9e:c300:4f2::1
$ nomad alloc exec -i -t -task redis 077c1c44 /bin/bash
root@8e87a7b70408:/data# cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
While working on https://github.com/hashicorp/nomad/issues/10628 I bumped into this. Something like https://github.com/hashicorp/nomad/issues/16624 might be the fix, but I'll need to get it sorted out sooner rather than later in any case.
Nomad version
Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)
Operating system and Environment details
CentOS Stream release 8 Docker version 20.10.7, build f0df350
Issue
When setting up name servers inside a CNI network configuration, for example using the bridge plugin, Nomad seems to not take into account the name servers in the context of starting a Docker container. Although the upstream
bridge
plugin at a first glance doesn't seem to support setting DNS this way (but rather should do so via an ipam plugin; which isn't implemented) it does work as can be seen by the debug log that Nomad produces on receiving the CNI config.I've marked it as a bug since looking at the source it does actually parse this information but then apparently gets lost somewhere in the process.
Reproduction steps
Setup Nomad client CNI:
And configure a CNI network:
Expected Result
The name servers defined in the CNI conflist are added into the resolv.conf
Actual Result
Docker adds the default/fallback name servers to the resolv.conf
Job file (if appropriate)
Nomad Server logs (if appropriate)