Open timotheencl opened 4 years ago
Hey @timotheenicolas
Unfortunately this is currently not supported with CNI, but I don't know of any technical limitation. Pinging @shoenig to see if this is a simple validation change or a more in depth one.
Thanks :) I think it would be a cool feature to have the ability to create Ingress GW which can bind on their own IP on a macvlan network
Just wanted to check if you had a chance to look at this further? Our use-case here is similar to @timotheenicolas's. We'd like to expose a connect sidecar service on a CNI based overlay network. Thank you!
Would like to see this happen too, similar use case. Thank you!
Hi everyone 👋
I've been looking into this issue but I can't seem to get Connect working which may indicate that there's more work that needs to be done than just removing the validation or it could be that I'm not configuring my CNI network properly (probably more likely 😅).
I have some custom binaries at the bottom of this page https://github.com/hashicorp/nomad/actions/runs/4059725660 that was built with my changes. This is the diff https://github.com/hashicorp/nomad/compare/d375f6043f2144b2e400ddd19a7e46b9f08cc1ce...120747566d92b118524650693ddf4e315e679688 of what is in the binary.
Would anyone with more experience with CNI be able to test them? One important note, these binaries are for development purpose only and should not run in production so make sure you don't accidentally run them with your production data.
I used the sample job file that is generated from nomad job init -short -connect
with a few modifications:
network.mode
set to cni/mycni. service.address_mode = "alloc"
to register the IP assigned by the CNI plugin in Consul instead of the host IP.sidecar_service.disable_default_tcp_check = true
to get around the fact that Nomad registers the sidecar proxy TCP check using the host IP. This is something else that may need to be fixed.Thanks in advance!
I have edited the title here to expand the scope to all CNI networks (so not just macvlan) and to Consul Service Mesh in general (no just ingress gateways).
I'm trying to use consul connect on my nomad clusters. However I'm limited to the fact that I have to lower the MTU on de bridge created by nomad.
Since it is hard coded I'm not able to do that, so I thought using a custom configuration and refer to it using mode = "cni/xxx"
But that fails because if this issue.
@lgfa29 Is there something I can do to help advancing this (older) issue?
Hi @netdata-be 👋
We're not currently working on this issue and I didn't receive feedback on the attempted fix mentioned in https://github.com/hashicorp/nomad/issues/8953#issuecomment-1411344922 and haven't had the time to validate it further.
If I were to build another set of binaries with those changes would be able to help validate if the changes work?
@lgfa29 - It looks like this would solve most of my questions at https://discuss.hashicorp.com/t/configure-network-pinning-for-jobs/63434. I would be happy to test a patched version of 1.6.x or 1.7.x to validate the changes.
Hi @lgfa29 I have done some tests using your patch (applied to nomad 1.7.7). A job which includes CNI and consul connect starts correctly, but the health check uses the incorrect address.
I am using a macvlan cni config:
{
"cniVersion": "1.0.0",
"name": "vlan107_dhcp",
"plugins": [
{
"type": "macvlan",
"master": "eth0.107",
"ipam": {
"type": "dhcp"
}
}
,
{
"type": "portmap",
"capabilities": {
"portMappings": true
},
"snat": true
}
]
}
ip a l
in the envoy sidecar shows that I get an address on vlan107 (it is via dhcp). This address is also shown correctly in the top-right of the consul service view.
2: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether fe:a6:92:0f:bc:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.107.139/24 brd 172.17.107.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::fca6:92ff:fe0f:bc7f/64 scope link
valid_lft forever preferred_lft forever
/secrets/envoy-bootstrap.cmd
connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr localhost:8500 -admin-bind 127.0.0.2:19001 -address 127.0.0.1:19101 -proxy-id _nomad-task-d04fe8fb-efa7-b2a6-565b-d709d1cf1a2e-group-nodered-nodered-1880-sidecar-proxy -bootstrap
There is a process (envoy?) listening on port 29130 (this is on IP 172.17.107.139
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.2:19001 0.0.0.0:* LISTEN 101 35993407 -
tcp 0 0 0.0.0.0:1880 0.0.0.0:* LISTEN 1000 35996775 -
tcp 0 0 0.0.0.0:29130 0.0.0.0:* LISTEN 101 35993417 -
Consul is trying to health check the nomad client's address though (dial tcp 172.17.17.234:29130: i/o timeout
).
From the docs (https://developer.hashicorp.com/nomad/docs/job-specification/service#address_mode), I expected the consul check of the sidecar to use the IP provided by CNI My service stanza is:
service {
name = "nodered"
address_mode = "alloc"
port = 1880
connect {
sidecar_service {
proxy {}
}
}
}
I can do more tests. Please let me know if anything more would help.
Ah nice, thanks for testing it @nakermann1973, I'm glad it kind of works 😅
Health checks are an interesting point. First you need to make sure the Consul agent would be able to reach the service at the IP:port allocated by the CNI plugin. Next we need a way to tell Nomad to use that IP:port as well.
For the first part, I'm not sure there's a single way to fix it. Each environment will need to be configured to fulfill this requirement.
The second part may require some code changes in how Nomad registers the service (and its health check) in Consul. If you run nomad job inspect <job ID>
do you see any health checks in the sidecar or your task?
And as a last note, I no longer work for HashiCorp, so I probably won't be able to help much on this issue any more.
I rolled back to the prod release, as it seemed like with this patch that health checks were failing across multiple services. I didn't dig into it too much, as my focus was to recover the failing services.
do you see any health checks in the sidecar or your task
I don't recall seeing any when I inspected the job
Nomad version
Nomad v0.12.5 (514b0d667b57068badb43795103fb7dd3a9fbea7)
Operating system and Environment details
Ubuntu Focal amd64 20.04.1
Issue
Hi !
I would like to add a CNI macvlan network to use with Consul Connect to enable an ingress gateway to be part of this publicly available network for clients.
Howerver after setup the CNI config file, nomad says that only "bridge" or "host" is correct.
Thanks !
My CNI config:
And my job file: