Closed smaccona closed 9 months ago
I think you're trying to solve this unconventionally.
Usually, it is desired that the reverse-proxy/load-balancer knows all upstreams (IPs and ports) even when they're not ready to serve traffic. One of the roles of load balancers is to monitor upstream health.
I think what you need is to configure health-checks, so it can automatically exclude unhealthy or not ready upstreams from the load balancing.
Check how to configure health checks for Layer 4 in: https://github.com/RussellLuo/caddy-ext/tree/master/layer4
Understood. The problem here is that we don't know all upstreams ahead of time. The backend in question is a fairly heavyweight Java-based server, to which you can manually add listener ports without downtime. We're talking 1000s of ports across 10s-100s of clients depending on client traffic, which is the reason we don't want to redeploy to add a new port because that would involve downtime for all the other clients (each client only sees their own VLAN to which they connect via VPN, so they can't see other client traffic) There are other deployment models we could deploy, but from a scaling and resource perspective this would be the best fit. We're a very small team, and are trying to migrate this from Kubernetes, where we were able to achieve this simply with kubectl expose
, but the overhead in general of managing a Kubernetes cluster is not trivial for us. This is what is making Swarm so appealing. If this is not possible with Caddy then that's fine, but if it is that would be great.
What you tried should have worked:
docker service update --label-add caddy.layer4.test_internal_hostname:6692.proxy.to="{{upstreams 6692}}"
Any info on why that didn't work? Did you end up seeing upstream unreplaced in caddyfile? I think that might happen if you end up with quotes around your label value.
Try:
docker service update --label-add "caddy.layer4.test_internal_hostname:6692.proxy.to={{upstreams 6692}}"
Inspect your service after adding the label and share the output with us. Share as well what caddyfile looks like after adding it.
Another thing is that docker might not have any events for label added, and CDP will not immediately see it and update Caddyfile. But CDP scans docker periodically as well (every 30s by default), so it should get it after a minute.
@lucaslorentz thank you for all your help - I must say that the community in general around Caddy and this project in particular has been extremely helpful and responsive.
I recreated the entire stack from scratch, with only this layer4
label:
caddy.layer4.test_internal_hostname:6692.proxy.to: "{{upstreams 6692}}"
With this setup, Caddy exhibits no errors, and from another host with a route to the test_internal_hostname
, I can see port 6692 is open and port 6693 is not:
$ nmap test_internal_hostname -p 6692,6693
Starting Nmap 6.40 ( http://nmap.org ) at 2023-12-23 00:42 CST
Nmap scan report for test_internal_hostname (192.168.10.1)
Host is up (0.00032s latency).
PORT STATE SERVICE
6692/tcp open unknown
6693/tcp closed unknown
Nmap done: 1 IP address (1 host up) scanned in 0.46 seconds
The layer4
portion of /config/caddy/Caddyfile.autosave
looks like this:
{
layer4 {
test_internal_hostname:6692 {
proxy {
to 172.19.0.4:6692
}
}
}
}
I can send TCP traffic to the port from other hosts (that have a route to that private address) without issue.
Next, I try to add a label to the service using your syntax:
docker service update --label-add "caddy.layer4.test_internal_hostname:6693.proxy.to={{upstreams 6693}}" my_service
The label gets added, and I wait a full minute. Here's a portion of the docker service inspect
for the service:
[
{
...
"Spec": {
"Name": "my_service",
"Labels": {
"caddy.layer4.test_internal_hostname:6693.proxy.to": "{{upstreams 6693}}",
"com.docker.stack.image": "...",
"com.docker.stack.namespace": "..."
},
"TaskTemplate": {
"ContainerSpec": {
"Image": "...",
"Labels": {
"caddy.layer4.test_internal_hostname:6692.proxy.to": "{{upstreams 6692}}",
...
A port scan from the other host now shows ports 6692 and 6693 are both closed:
$ nmap test_internal_hostname -p 6692,6693
Starting Nmap 6.40 ( http://nmap.org ) at 2023-12-23 00:48 CST
Nmap scan report for test_internal_hostname (192.168.10.1)
Host is up (0.00057s latency).
PORT STATE SERVICE
6692/tcp closed unknown
6693/tcp closed unknown
Nmap done: 1 IP address (1 host up) scanned in 0.51 seconds
docker logs
on the Caddy container shows this:
{"level":"info","ts":1703314365.8086896,"logger":"docker-proxy","msg":"Process Caddyfile","logs":"[ERROR] Removing invalid block: parsing caddyfile tokens for 'layer4': wrong argument count or unexpected line ending after 'to', at Caddyfile:10\n{\n\tlayer4 {\n\t\ttest_internal_hostname:6692 {\n\t\t\tproxy {\n\t\t\t\tto 172.19.0.4:6692\n\t\t\t}\n\t\t}\n\t\ttest_internal_hostname:6693 {\n\t\t\tproxy {\n\t\t\t\tto\n\t\t\t}\n\t\t}\n\t}\n}\n\n"}
Expanding to make it more readable shows the {{upstreams 6693}}
is not getting replaced:
{
layer4 {
test_internal_hostname:6692 {
proxy {
to 172.19.0.4:6692
}
}
test_internal_hostname:6693 {
proxy {
to
}
}
}
}
At this point the /config/caddy/Caddyfile.autosave
file in the Caddy container only contains configurations for domains served over HTTP, no layer4
config at all.
Any insight you have would be much appreciated!
Got it. So basically what you're facing is that upstream labels at container level are working, but at service level are not. I wonder if you can add the label via cli to container instead of service, same place where the one for port 6692 is.
Do you see logs like "Service is not in same network as caddy"?
This is how we expand upstreams for service labels: https://github.com/lucaslorentz/caddy-docker-proxy/blob/a98677e4218825ae9204577e633e7d6995343355/generator/services.go#L52
This is how we expand it for container labels: https://github.com/lucaslorentz/caddy-docker-proxy/blob/a98677e4218825ae9204577e633e7d6995343355/generator/containers.go#L17
@smaccona Could you please try the improvements I did in https://github.com/lucaslorentz/caddy-docker-proxy/pull/559 ?
Your empty upstreams for labels at service level seem to be caused by issue 1 I mentioned in that PR.
Yes, I see the label that works gets put under Spec.TaskTemplate.ContainerSpec
on the service whereas the one I add afterwards is directly under Spec.Labels
. Your suggestion to add it to the container at runtime is a good one, but unfortunately Docker doesn't support that - there are a bunch of comments such as this one https://github.com/moby/moby/issues/21721#issuecomment-1753170250 where everyone wants mutable labels on containers for Traefik, presumably for the same reason I do here.
I do see warning-level logs that the "service is not in the same network as caddy", because Caddy is running in network_mode: host
because I need it to be able to bind to host network interfaces, and of course when it's in network_mode: host
it can't be in any other Docker networks.
I also haven't been able to get labels working for the service when they are under deploy
in the Docker Compose file instead of directly under the service. When I put them under deploy
, I get the exact same situation that I get when I add the label dynamically: just a blank to
line under proxy
with no target/destination. I suspect if I can fix this, then adding them dynamically will work, so I created a minimal example using just netcat
:
version: "3.3"
services:
nc:
image: alpine
command: nc -lvnp 8080
deploy:
mode: replicated
replicas: 1
labels:
caddy.layer4.nc_test:8080.proxy.to: "{{upstreams 8080}}"
networks:
- caddy
networks:
caddy:
external: true
When I deploy this with docker stack deploy -c nc.yaml nc
and let it settle, I see the following error in Caddy:
{"level":"info","ts":1703339176.8187501,"logger":"docker-proxy","msg":"Process Caddyfile","logs":"[ERROR] Removing invalid block: parsing caddyfile tokens for 'layer4': wrong argument count or unexpected line ending after 'to', at Caddyfile:5\n{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto\n\t\t\t}\n\t\t}\n\t}\n}\n\n"}
Expanding the offending part, I see the to
is on its own again with no destination/target. I thought for Swarm deployments the labels
should be under deploy
, but I can't get them to work there. Thanks again for any suggestions/input.
I do see warning-level logs that the "service is not in the same network as caddy", because Caddy is running in network_mode: host
Yeah, you shouldn't see this message. If you see it in your logs, it's because CDP couldn't match any of the container/task networks and IP with the configured ingress network (caddy network in your case), and you end up with no upstreams.
Even with your workaround using nc, you stumbled on same issue where CDP is not able to match swarm service networks by ID because network ID in swarm tasks are just something else, they're not the actual network ID. #559 fixes that problem.
So, I think you have a few options solve your problem now:
a) get the fix I did in #559. Remove your nc workaround and do things the way you were trying when you opened this issue.
b) add label caddy_ingress_network=caddy
along with the other labels you're adding to your container/service. That will do a network matching by Name, and will not suffer from issues fixed in #559
c) keep your nc workaround, but move its labels outside deploy to make them container labels, container network matching is working fine
@lucaslorentz thank you - see my comments in https://github.com/lucaslorentz/caddy-docker-proxy/pull/559#issuecomment-1868387002
Just to be clear about my constraints:
Got it, thanks for clarifying.
The first part of the fix in #559, to match service network by NAME, should still be useful to you and fix your issue.
The second part of the PR about HOST network won't be useful with the setup you're doing, but it's still a feature I think we should have.
@lucaslorentz I am not clear how the first part helps me, but I think we should move this discussion to the PR thread (starting at https://github.com/lucaslorentz/caddy-docker-proxy/pull/559#issuecomment-1868394919 for those of you who are interested) instead so all subsequent discussion is there (to help future users encountering these types of issues). I'll continue this conversation there. Thank you!
@lucaslorentz thank you again for all your help. I was able to get this working correctly by using a local network instead of a Swarm network, like you suggested. In production, however, I will be using a mix of Swarm stacks (using docker stack deploy
) and local deployments (using docker-compose up
) and I want to have Caddy proxy traffic for both types of deployment.
So to test this I set up two networks, caddy-local
and caddy-swarm
, using drivers overlay
and bridge
:
# docker network create --driver overlay --attachable caddy_swarm
njmbytr8wa4e68bi2o1e6w7yq
# docker network create --driver bridge --attachable caddy_local
5399a302a792254a3eb45be45f0b62f0500a72dbca1d0bb722bb2fd290227e4b
Extract from docker network ls
:
# docker network ls
NETWORK ID NAME DRIVER SCOPE
5399a302a792 caddy_local bridge local
njmbytr8wa4e caddy_swarm overlay swarm
Next I split the previous deployment which I had outlined in https://github.com/lucaslorentz/caddy-docker-proxy/pull/559#issuecomment-1872451462 into two parts: I kept Caddy and my whoami
container in docker-compose
(I need Caddy to always be in docker-compose
because I need it to bind to host interfaces), and I moved my nc
deployment into Swarm. Here are the two files:
# docker-compose.yaml
version: '3.7'
services:
caddy:
image: caddy-cdp-layer4
container_name: caddy
environment:
- CADDY_INGRESS_NETWORKS=caddy_local,caddy_swarm
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock
extra_hosts:
- host.docker.internal:host-gateway
- "nc_test:192.168.10.1"
restart: unless-stopped
whoami:
image: containous/whoami
networks:
- caddy_local
labels:
caddy: whoami.example.com
caddy.reverse_proxy: "{{upstreams 80}}"
caddy.tls: "internal"
caddy_ingress_network: caddy_local
networks:
caddy_swarm:
external: true
caddy_local:
external: true
# nc.yaml
version: '3.7'
services:
nc:
image: alpine
command: nc -lvnp 8080
labels:
caddy.layer4.nc_test:8080.proxy.to: "{{upstreams 8080}}"
caddy_ingress_network: caddy_swarm
networks:
- caddy_swarm
networks:
caddy_swarm:
external: true
I brought both up using docker-compose up -d
and docker stack deploy -c nc.yaml nc
. Everything came up as expected, and I can see in the Caddy logs that it correctly identifies the networks and the proxy upstream targets (I also expanded the Caddyfile entry):
{"level":"info","ts":1704213705.6355045,"logger":"docker-proxy","msg":"New Caddyfile","caddyfile":"{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto 10.0.2.6:8080\n\t\t\t}\n\t\t}\n\t}\n}\nwhoami.example.com {\n\treverse_proxy 172.20.0.2:80\n\ttls internal\n}\n"}
{
layer4 {
nc_test:8080 {
proxy {
to 10.0.2.6:8080
}
}
}
}
whoami.example.com {
reverse_proxy 172.20.0.2:80
tls internal
}
This matches the IP entries from docker inspect
on the running containers (first the container for nc
and second the container for whoami
):
...
"NetworkSettings": {
"Bridge": "",
"SandboxID": "9628d8207041b32b4880d81b5c10d33f2576b22967e70e54120c130365ed2f14",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "/var/run/docker/netns/9628d8207041",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {
"caddy_swarm": {
"IPAMConfig": {
"IPv4Address": "10.0.2.6"
},
"Links": null,
"Aliases": [
"741a1c3f6540"
],
"NetworkID": "njmbytr8wa4e68bi2o1e6w7yq",
"EndpointID": "4cdeab13c69ac20d444819d0d8a9170118d3628e02752fbe64308d9220405047",
"Gateway": "",
"IPAddress": "10.0.2.6",
"IPPrefixLen": 24,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:0a:00:02:06",
"DriverOpts": null
}
}
}
...
...
"NetworkSettings": {
"Bridge": "",
"SandboxID": "ae26a2cf8a47176f7b5f2b5721633ae57b3e631066a803c5aa6e3d9dde325a69",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"80/tcp": null
},
"SandboxKey": "/var/run/docker/netns/ae26a2cf8a47",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {
"caddy_local": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"whoami",
"92dcd6e9230b"
],
"NetworkID": "5399a302a792254a3eb45be45f0b62f0500a72dbca1d0bb722bb2fd290227e4b",
"EndpointID": "4568b72f4cf9b5ab43cb910934498c91fd4d5cb15eb330b25edf433b3fe893c2",
"Gateway": "172.20.0.1",
"IPAddress": "172.20.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:14:00:02",
"DriverOpts": null
}
}
}
...
I am able to talk to the whoami
container:
# time curl --show-error -s -k -f --resolve whoami.example.com:443:127.0.0.1 https://whoami.example.com
Hostname: 92dcd6e9230b
IP: 127.0.0.1
IP: 172.20.0.2
RemoteAddr: 172.20.0.1:57224
GET / HTTP/1.1
Host: whoami.example.com
User-Agent: curl/7.74.0
Accept: */*
Accept-Encoding: gzip
X-Forwarded-For: 127.0.0.1
X-Forwarded-Host: whoami.example.com
X-Forwarded-Proto: https
real 0m0.034s
user 0m0.018s
sys 0m0.013s
But I am not able to talk to the nc
container:
# time echo -n 'Line of text' | nc nc_test 8080
real 2m11.128s
user 0m0.006s
sys 0m0.001s
And in the Caddy logs:
{"level":"error","ts":1704214298.2501729,"logger":"layer4","msg":"handling connection","error":"dial tcp 10.0.2.6:8080: connect: connection timed out"}
And there is nothing in the nc
container's logs. So Caddy received the request and tried to forward it to the correct upstream, but wasn't able to connect.
Should Caddy be able to forward traffic to both networks at once?
Edit to add additional Caddy log lines to show both networks are there:
...
{"level":"info","ts":1704213697.2888598,"logger":"docker-proxy","msg":"Start","CaddyfilePath":"","EnvFile":"","LabelPrefix":"caddy","PollingInterval":30,"ProxyServiceTasks":true,"ProcessCaddyfile":true,"ScanStoppedContainers":true,"IngressNetworks":"[caddy_local caddy_swarm]","DockerSockets":[""],"DockerCertsPath":[""],"DockerAPIsVersion":[""]}
...
{"level":"info","ts":1704213697.2915056,"logger":"docker-proxy","msg":"IngressNetworksMap","ingres":"map[5399a302a792254a3eb45be45f0b62f0500a72dbca1d0bb722bb2fd290227e4b:true caddy_local:true caddy_swarm:true njmbytr8wa4e68bi2o1e6w7yq:true]"}
...
Doesn't look like a CDP limitation, CDP should be fine to connect to multiple networks.
But just to rule out any CDP problems, try to nc 10.0.2.6 8080
from CDP container. Use docker exec for that. I assume it will fail as well.
Try to nc 10.0.2.6 8080
from outside CDP container as well, I expect it will fail as well.
Probably you're not able to talk to an overlay network from a container that is not in the network. Because an overlay network is a completely virtual network, with no connection/bridge to your host network. Not a networking guy, so I'm not sure if I'm using the right terms here :-)
Found this: https://github.com/moby/moby/issues/18357, it might be useful.
Can you please clarify further if you're planning to use multiple nodes in your swarm cluster? Do you plan to run multiple CDP instances? Edit: Would be interesting to understand the entire topology you want to be able to advise here.
I wonder if you can simplify networking by not using host network, and binding the entire port range you're going to use. Something like:
version: "3.7"
services:
caddy:
image: lucaslorentz/caddy-docker-proxy:ci-alpine
ports:
- 80:80
- 443:443
- 6000-7000:6000-7000
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- caddy_swarm
whoami:
image: containous/whoami
networks:
- caddy_swarm
labels:
caddy: whoami.example.com
caddy.reverse_proxy: "{{upstreams 80}}"
caddy.tls: "internal"
caddy_ingress_network: caddy_swarm
volumes:
caddy_data: {}
networks:
caddy_swarm:
name: caddy_swarm
AFAIK, with that setup, any swarm node will be able to receive requests in all those ports and forward them to CDP service, regardless of which node is running CDP. Only problem of this approach is that there will be more hops, and CDP will not see the real client IP, not sure if that's relevant to you.
Edit: I assume you want to keep CDP to do TLS termination for requests to ports 6000-7000, if you're not using TLS and CDP managed certificates on those ports, you could just publish the port range directly on your nc service, and completely skip CDP. Docker swarm should expose those ports on all nodes, and forward requests to your service.
Confirmed, I can't connect to port 8080 on either the service IP or the container IP from inside the Caddy container. That's interesting to me because I thought my initial test of Caddy running in host mode was able to talk to a service running on the Swarm overlay network, but I must have been mistaken.
Appreciate the link, useful reading for sure - it indicates the way to do this is to talk to the container's interface on the docker_gwbridge
network instead of on its other network(s). Sure enough, I was able to do this:
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9cbc08b0b4e8 alpine:latest "nc -lvnp 8080" About a minute ago Up About a minute nc_nc.1.ongg0dx7cwyrv70t9jbobcwx5
693c01aa19eb caddy-cdp-layer4 "/bin/caddy docker-p…" 8 minutes ago Up 8 minutes caddy
# docker inspect docker_gwbridge
[
{
"Name": "docker_gwbridge",
...
"Containers": {
"9cbc08b0b4e856a7aabe9de83a2707d25b038b515a8638e8ce65ed49ba90576e": {
"Name": "gateway_7067a0b716fa",
"EndpointID": "b6e7f0deb84e1b19d6c01ba99900848e7bcc553e80a8a3109b25a791d0c09c12",
"MacAddress": "02:42:ac:12:00:03",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
},
...
# docker exec -it 693c01aa19eb sh
/srv # echo -n 'Line of text' | nc 172.18.0.3 8080
/src # exit
# docker logs 9cb
listening on [::]:8080 ...
connect to [::ffff:172.18.0.3]:8080 from [::ffff:172.18.0.1]:42685 ([::ffff:172.18.0.1]:42685)
Line of text
I can't find a way to automate this though - I tried adding docker_gwbridge
to Caddy's CADDY_INGRESS_NETWORKS
, but you you can't include docker_gwbridge
explicitly as a network in a Docker Swarm YAML because it's local
, and if you specify caddy_ingress_network: docker_gwbridge
in the Swarm service's YAML file, you get "Service is not in same network as caddy" in the Caddy logs even though the startup appears to include docker_gwbridge
as one of the networks Caddy is monitoring:
...
{"level":"info","ts":1704230034.9529457,"logger":"docker-proxy","msg":"Start","CaddyfilePath":"","EnvFile":"","LabelPrefix":"caddy","PollingInterval":30,"ProxyServiceTasks":true,"ProcessCaddyfile":true,"ScanStoppedContainers":true,"IngressNetworks":"[caddy_local caddy_swarm docker_gwbridge]","DockerSockets":[""],"DockerCertsPath":[""],"DockerAPIsVersion":[""]}
...
{"level":"info","ts":1704230034.9556465,"logger":"docker-proxy","msg":"IngressNetworksMap","ingres":"map[48b7398d43fa1a676423632f442ab17ee71542bd238431faf102197c76b64c61:true 5399a302a792254a3eb45be45f0b62f0500a72dbca1d0bb722bb2fd290227e4b:true caddy_local:true caddy_swarm:true docker_gwbridge:true njmbytr8wa4e68bi2o1e6w7yq:true]"}
...
{"level":"warn","ts":1704230077.1592085,"logger":"docker-proxy","msg":"Service is not in same network as caddy","service":"nc_nc","serviceId":"6bq2s6tdsew7rj1o50mq2fbjd"}
{"level":"info","ts":1704230077.1593533,"logger":"docker-proxy","msg":"Process Caddyfile","logs":"[ERROR] Removing invalid block: parsing caddyfile tokens for 'layer4': wrong argument count or unexpected line ending after 'to', at Caddyfile:5\n{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto\n\t\t\t}\n\t\t}\n\t}\n}\n\n"}
As a workaround, I am now looking at running two instances of Caddy, one on the host network which will proxy only docker-compose
services, and one on the overlay network which will proxy Swarm services. I will leave this open for now and close it with a final comment describing my solution as documentation for others (or failure if that's the case).
Edit: @lucaslorentz I just saw your other comments about Swam node count (yes we will be deploying across multiple nodes) and suggestions on publishing the layer4
container (nc
in my basic test case example) on all host ports I might want to publish now or in the future. The restriction I have here is that I don't want to publish all ports on all host interfaces. Here's an example of the type of restriction I am talking about:
192.168.10.1/32
, which is bound to a real interface on host X. Client A should only ever be able to send traffic to port 10001
(say) on IP 192.168.10.1
192.168.11.1/32
, and IP 192.168.11.1
is bound to a real interface also on host X. Client B should only ever be able to send traffic to port 10002
(say) on IP 192.168.11.1
nc
or from Caddy onto the host, it will by default publish on all interfaces, which means Client A will be able to see port 10002
and Client B will be able to see port 10001
, which would be bad. So we have to publish each port we want on each IP address separately, instead of specifying a range, and if they change we have to redo the docker-compose
file and redeploy, which involves downtime for multiple clients (in real life, our backend service is not nc
but a heavyweight Java server which is slow to start up, as I mentioned)This is the reason why dynamically adding the layer4
labels post-deploy is so attractive to us: it allows us to (a) specify a specific IP address to publish on (well we're using a hostname, but it resolves to an IP address in our internal DNS), and (b) allows Caddy to proxy it without backend or Caddy downtime.
RE: your comment about TLS - the backend services we are proxying with layer4
are vanilla TCP, encryption is handled instead at the VPN level. We will be using Caddy to proxy web services as well and auto-provisioning of TLS is working fine for us there.
@lucaslorentz two (hopefully quick) questions for you:
docker_gwbridge
network instead of any other networks? When running in host network mode, this is the only way for Caddy to communicate with Swarm containers. In fact, when running in host mode it may make sense to just make this the default for Swarm services/containers. The wrinkle is that docker_gwbridge
is not listed in the networks for the service/container when you inspect the service/container - to see it, you have to instead inspect docker_gwbridge
and there you will see the container ID listed.caddy.layer4.192.168.1.1:8080.proxy.to: "{{upstreams 8080}}"
in labels because the periods/dots are interpreted as nesting levels in the Caddyfile
. We are currently maintaining DNS entries to work around this, but it's just additional overhead to manage especially when we are talking about 100s or up to potentially 1000s of host interface IPs. Is there a way to escape the periods in the 192.168.1.1
portion of the label so we can use IP addresses in the labels directly?Many thanks!
@lucaslorentz I have not been able to solve this for the layer4
piece. To recap, here are the 2 key requirements:
layer4
proxy instructions to specific host IP addresses, so that Caddy can forward (say) TCP traffic being sent to (say) host IP address 192.168.1.1
on port (say) 8080
to the backend service, but clients who have access only to a different host IP address (say 192.168.1.2
) won't be able to see port 8080
on the backend service (because those clients don't have any access to IP address 192.168.1.1
). This capability is supported using labels like caddy.layer4.nc_test:8080.proxy.to: "{{upstreams 8080}}"
, where nc_test
resolves via local DNS to 192.168.1.1
.layer4
labels dynamically to the backend service during runtime, so that Caddy can reconfigure itself to add new proxy mappings without backend and/or Caddy downtime. This is necessary for us because backend downtime means multiple clients lose access, and recycling the backend service is not quick.Now here are 3 observations:
network_mode: host
or by reserving/publishing for Caddy a large range of ports you might need in the future), and the backend service in Swarm mode (but restricted to a single host using host labels), then you can dynamically add/remove labels on the backend service at runtime, and Caddy can pick up on those changes in the labels and proxy traffic to newly-exposed services as you expose them on the backend service. The problem with this approach, however, is that if Caddy is running in local mode, then it can't actually talk to anything running on any Swarm (overlay) networks because Docker doesn't allow it (at least not automatically - see the link you shared at https://github.com/moby/moby/issues/18357 for the rationale, and see below for some detail around this).docker-compose
, say), then Caddy running in local mode will be able to see the network it's on and will be able to communicate with it, but Docker doesn't allow changing of any labels on a running local container post-deployment, so we can't dynamically add labels to the backend service that Caddy can pick up to start proxying traffic to the backend service.cap_add: NET_ADMIN
), it can't bind to specific host IP addresses (layer4
gives errors such as listen tcp 192.168.1.1:8080: bind: cannot assign requested address
), and without that capability users of different host IP address will be able to see the ports that should be reserved for other users.So we're stuck: to configure labels dynamically, the backend service must be running in Swarm mode. To talk to the backend service in Swarm mode, Caddy must be in Swarm mode. To bind to a host IP, Caddy must be in local mode.
It's possible for a local service (like Caddy) to talk to a Swarm service through the docker_gwbridge
network on the local host, but that's currently not supported by Caddy/caddy-docker-proxy
. This is why I asked about this above, and if this was possible it would completely solve my problem. I have no idea how hard this would be to implement, and whether it would need to be done in Caddy (I suspect it would) or whether it could be done just in caddy-docker-proxy
. I tried specifying docker_gwbridge
as the network for the Swam backend service, but that doesn't work because it's not listed as one of the networks when you inspect the service. The running container does show up under containers
when you inspect the docker_gwbridge
network, though.
I realize there are other solutions for our use case: we could perhaps use a different proxy for layer4
traffic or write our own; we could implement some other way of restricting access to ports on the backend apart from dedicated VLANs; we could use iptables
to manually create forwarding rules, and so on. But the attraction of being able to quickly configure Caddy to proxy layer4
traffic via labels is really simple and compelling. If you have any ideas here, it would be really appreciated. Thanks!
docker_gwbridge is a local network (node-specific), so, getting container IP for docker_gwbridge would only work if CDP and container are in the same swarm node, making this feature a bit incompatible with swarm concepts.
So we're stuck: to configure labels dynamically, the backend service must be running in Swarm mode. To talk to the backend service in Swarm mode, Caddy must be in Swarm mode. To bind to a host IP, Caddy must be in local mode.
Would it work if you ran everything without swarm by using docker-compose
.
And use swarm configs to expose additional ports in CDP? Note that everything will be non-swarm, only the config will be a swarm config.
Exposing a port:
echo "{\nlayer4 {\n192.168.1.1:6692 {\nproxy {\nto localhost:6692\n}\n}\n}\n}\n" | docker config create -l caddy expose-port-6692 -
Closing a port:
docker config rm expose-port-6692
Note that you don't have {{upstreams}}
for this setup, you would need a way to reach the container, maybe using localhost
?
@lucaslorentz apologies for the delayed response. We ended up going with a homegrown approach using supervisor
to keep multiple instances of simpleproxy
running and pointing to the correct upstreams, and a simple Python script that monitors a configuration file and gets supervisor
to restart/start changed services if it changes.
I'm going to close this out - I didn't try your suggested approach above using Configs, but in any case we would need {{upstreams}}
to work because we have to bind to specific IPs instead of just localhost
. Thanks again for all your help and suggestions!
I am successfully running Caddy with
caddy-docker-proxy
andlayer4
in standalone Docker Compose mode, withnetwork_mode: host
so that I can expose backend Swarm services on specific host IP addresses and ports. Here's an example of what my labels in the Docker Compose file for a Swarm stack looks like:All of this works fine (
test_internal_hostname
will resolve via internal DNS to an IP on the host in question). The issue I am having is that this backend service container takes some time to start, so I'd like to be able to manually add labels to it when additional ports need to be exposed via Caddy, rather than having to redeploy the stack (which involves clients for existing ports losing their connections). There are two aspects to the problem:docker service update --label-add
, but I can't use{{upstreams}}
in that syntax and instead I have to enter the IPs manuallycaddy
network as the service because it can't both be innetwork_mode: host
and also part of a Docker network (I useCADDY_INGRESS_NETWORKS=caddy
)So there's my conundrum: in
host
mode, I can't do things likedocker service update --label-add caddy.layer4.test_internal_hostname:6692.proxy.to="{{upstreams 6692}}"
ordocker service update --label-add caddy.layer4.test_internal_hostname:6692.proxy.to=service_name:6692
, instead I have to find the service IPs and use those (and they can change on reboot or service restart), but in non-host
mode I can't publish to arbitrary host IPs/ports without restarting Caddy, which also involves downtime.I am not familiar enough with the internals of
caddy-docker-proxy
to know how it monitors the upstreams for changes - is that something I can manually trigger, or is there a way to useupstreams
when adding a label, or does anyone else have any solutions or workarounds for this?