lucaslorentz / caddy-docker-proxy

Caddy as a reverse proxy for Docker
MIT License
2.61k stars 163 forks source link

Improve network matching and compatibility with host network #559

Closed lucaslorentz closed 6 months ago

lucaslorentz commented 6 months ago

Issues fixed:

  1. Network IDs in swarm tasks don't match the ID of the network, resulting in "Service is not in same network as caddy" and no upstreams. I changed it to match by network name as well.
  2. HOST network as ingress network was resulting in no upstreams because there is no IP address information on host network attachments. I changed it to use 127.0.0.1 for those cases.

Test config is in: https://github.com/lucaslorentz/caddy-docker-proxy/blob/improve-network-matching/tests/host-network/compose.yaml . Note that no additional config was necessary and CDP properly recognized host network as ingress, and used 127.0.0.1 address to reach containers.

Fixes #558

lucaslorentz commented 6 months ago

@smaccona Doing some improvements here that will fix #558. Can you please try it before I merge?

Build your image using:

FROM caddy:builder-alpine as builder

RUN xcaddy build \
    --with github.com/lucaslorentz/caddy-docker-proxy/v2@improve-network-matching \
    --with github.com/RussellLuo/caddy-ext/layer4

FROM caddy:alpine

COPY --from=builder /usr/bin/caddy /bin/caddy

ENTRYPOINT ["/bin/caddy"]

CMD ["docker-proxy"]

I expect it to fix your problem without any config changes from your side. But this also enables you to ditch the caddy network, and just use the host network for both CDP and your services if you want.

smaccona commented 6 months ago

@lucaslorentz thank you very much for this build. I decided to completely remove all existing traces of Caddy before building this new image and deploying it. I stopped Caddy and removed all the Caddy images I had previously built. Confirming there were no remaining images:

# docker images | grep caddy
#

Here's my Dockerfile for the new build:

FROM caddy:builder-alpine as builder

RUN xcaddy build \
    --with github.com/lucaslorentz/caddy-docker-proxy/v2@improve-network-matching \
    --with github.com/RussellLuo/caddy-ext/layer4

FROM caddy:alpine

COPY --from=builder /usr/bin/caddy /bin/caddy

ENTRYPOINT ["/bin/caddy"]

CMD ["docker-proxy"]

I build and tag the image, bypassing cache just in case:

# docker build --no-cache -t caddy-test .

The image builds successfully and here are the last two lines of the output:

Successfully built f4fa0653fbdc
Successfully tagged caddy-test:latest

My Caddy docker-compose.yaml file:

version: '3.3'

services:
  caddy:
    image: caddy-test
    container_name: caddy
    network_mode: host
    environment:
      - CADDY_INGRESS_NETWORKS=caddy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /volumes/caddy_data:/data
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Run Caddy:

# docker-compose up -d
WARNING: The Docker Engine you're using is running in swarm mode.

Compose does not use swarm mode to deploy services to multiple nodes in a swarm. All containers will be scheduled on the current node.

To deploy your application across the swarm, use `docker stack deploy`.

Creating caddy ... done

Confirm it's using the same image f4fa0653fbdc I just built:

# docker inspect caddy | grep Image
        "Image": "sha256:f4fa0653fbdc4ed311356b8f1aa8658ed209dea4147f4cfd0efbcea2a2707ad9",
            "Image": "caddy-test",
#

I checked to make sure there were no errors in Caddy logs to this point, and used -f to follow the logs in this session while I ran my service in another:

# docker logs -f caddy
{"level":"info","ts":1703367233.8608558,"logger":"docker-proxy","msg":"Running caddy proxy server"}
{"level":"info","ts":1703367233.8640783,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":1703367233.864477,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
{"level":"info","ts":1703367233.8644946,"logger":"docker-proxy","msg":"Running caddy proxy controller"}
{"level":"info","ts":1703367233.8652968,"logger":"docker-proxy","msg":"Start","CaddyfilePath":"","EnvFile":"","LabelPrefix":"caddy","PollingInterval":30,"ProxyServiceTasks":true,"ProcessCaddyfile":true,"ScanStoppedContainers":true,"IngressNetworks":"[caddy]","DockerSockets":[""],"DockerCertsPath":[""],"DockerAPIsVersion":[""]}
{"level":"info","ts":1703367233.8664138,"logger":"docker-proxy","msg":"Connecting to docker events","DockerSocket":""}
{"level":"info","ts":1703367233.8681114,"logger":"docker-proxy","msg":"IngressNetworksMap","ingres":"map[5cvx90pblvvp4qjqctwr5dtgk:true caddy:true]"}
{"level":"info","ts":1703367233.883187,"logger":"docker-proxy","msg":"Swarm is available","new":true}
{"level":"info","ts":1703367233.8987343,"logger":"docker-proxy","msg":"New Caddyfile","caddyfile":"# Empty caddyfile"}
{"level":"warn","ts":1703367233.8991923,"logger":"docker-proxy","msg":"Caddyfile to json warning","warn":"[Caddyfile:1: Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies]"}
{"level":"info","ts":1703367233.8992112,"logger":"docker-proxy","msg":"New Config JSON","json":"{}"}
{"level":"info","ts":1703367233.8992536,"logger":"docker-proxy","msg":"Sending configuration to","server":"localhost"}
{"level":"info","ts":1703367233.9005232,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"53548","headers":{"Accept-Encoding":["gzip"],"Content-Length":["41"],"Content-Type":["application/json"],"User-Agent":["Go-http-client/1.1"]}}
{"level":"info","ts":1703367233.9006011,"msg":"config is unchanged"}
{"level":"info","ts":1703367233.900609,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1703367233.9007978,"logger":"docker-proxy","msg":"Successfully configured","server":"localhost"}

Now my service YAML, again just netcat listening on a port:

# cat nc.yaml 
version: "3.3"

services:

  nc:
    image: alpine
    command: nc -lvnp 8080
    labels:
      caddy_ingress_network: caddy
    networks:
      - caddy

networks:
  caddy:
    external: true

Note the labels are directly on the service, not under deploy and I added caddy_ingress_network: caddy to the labels, and did not add a label to have Caddy proxy the service, so I can test adding that afterwards.

Deploy:

# docker stack deploy -c nc.yaml nc
Creating service nc_nc
#

I confirmed that nothing new was added to the Caddy logs at this point. Then I added the label:

# docker service update --label-add "caddy.layer4.nc_test:8080.proxy.to={{upstreams 8080}}" nc_nc
nc_nc
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
#

Two new lines in Caddy logs:

{"level":"warn","ts":1703367587.6173735,"logger":"docker-proxy","msg":"Service is not in same network as caddy","service":"nc_nc","serviceId":"bxz5bax9er3gqp4vok2wwz333"}
{"level":"info","ts":1703367587.6175864,"logger":"docker-proxy","msg":"Process Caddyfile","logs":"[ERROR]  Removing invalid block: parsing caddyfile tokens for 'layer4': wrong argument count or unexpected line ending after 'to', at Caddyfile:5\n{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto\n\t\t\t}\n\t\t}\n\t}\n}\n\n"}

Again it can't match to the destination and thinks it's not in the same network. Here are the relevant lines from docker service inspect on the service:

# docker service inspect nc_nc
...
        "Spec": {
            "Name": "nc_nc",
            "Labels": {
                "caddy.layer4.nc_test:8080.proxy.to": "{{upstreams 8080}}",
                "com.docker.stack.image": "alpine",
                "com.docker.stack.namespace": "nc"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48",
                    "Labels": {
                        "caddy_ingress_network": "caddy",
                        "com.docker.stack.namespace": "nc"
                    },

Then I removed the stack for the service, and moved the caddy_ingress_network label under deploy in th stack's YAML:

# cat nc.yaml
version: "3.3"

services:

  nc:
    image: alpine
    command: nc -lvnp 8080
    deploy:
      labels:
        caddy_ingress_network: caddy
    networks:
      - caddy

networks:
  caddy:
    external: true

Redeploy:

# docker stack deploy -c nc.yaml nc
Creating service nc_nc
#

I confirmed no new entries appeared in the Caddy logs. Now add the label again:

# docker service update --label-add "caddy.layer4.nc_test:8080.proxy.to={{upstreams 8080}}" nc_nc
nc_nc
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
#

Same error in the Caddy logs:

{"level":"warn","ts":1703368181.808466,"logger":"docker-proxy","msg":"Service is not in same network as caddy","service":"nc_nc","serviceId":"dt8ahvxo4bvrr5taw26476zmd"}
{"level":"info","ts":1703368181.8087192,"logger":"docker-proxy","msg":"Process Caddyfile","logs":"[ERROR]  Removing invalid block: parsing caddyfile tokens for 'layer4': wrong argument count or unexpected line ending after 'to', at Caddyfile:5\n{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto\n\t\t\t}\n\t\t}\n\t}\n}\n\n"}

docker service inspect for this service shows the new label and the caddy_ingress_network one in the same place:

# docker service inspect nc_nc
[
    {
        "ID": "dt8ahvxo4bvrr5taw26476zmd",
        "Version": {
            "Index": 4585
        },
        "CreatedAt": "2023-12-23T21:46:19.059512875Z",
        "UpdatedAt": "2023-12-23T21:49:41.689948842Z",
        "Spec": {
            "Name": "nc_nc",
            "Labels": {
                "caddy.layer4.nc_test:8080.proxy.to": "{{upstreams 8080}}",
                "caddy_ingress_network": "caddy",
                "com.docker.stack.image": "alpine",
                "com.docker.stack.namespace": "nc"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48",
                    "Labels": {
                        "com.docker.stack.namespace": "nc"

              },
...

Finally, here's my caddy network:

# docker network inspect caddy
[
    {
        "Name": "caddy",
        "Id": "5cvx90pblvvp4qjqctwr5dtgk",
        "Created": "2023-12-23T15:46:19.227748634-06:00",
        "Scope": "swarm",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.21.0.0/16",
                    "Gateway": "172.21.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "c23cb1136a8c0b3a9bd967a26d7b7f3699491548b1a74da8e5d41a64cbba92a6": {
                "Name": "nc_nc.1.vy528koyx005z08rsavfnykez",
                "EndpointID": "b3864400ce734fb7b2dd9acaeea29a2d1f678be048632ede35bdaf663ce03931",
                "MacAddress": "02:42:ac:15:00:02",
                "IPv4Address": "172.21.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

I pasted everything exactly as I did it to make sure that I am following your intent here correctly. Let me know if there are some step(s) I should do differently, or if there are more logs/info you would like to see. Many thanks!

lucaslorentz commented 6 months ago

@smaccona Thanks for the detailed test and information.

I wasn't expecting that outcome 😄 . It would be great if you could provide more information to diagnose the issue, I will share a step-by-step below.

Basic checks

This log shows all configs passed to CDP, I can see there that you configured it to use caddy as the ingress network:

{"level":"info","ts":1703367233.8652968,"logger":"docker-proxy","msg":"Start","CaddyfilePath":"","EnvFile":"","LabelPrefix":"caddy","PollingInterval":30,"ProxyServiceTasks":true,"ProcessCaddyfile":true,"ScanStoppedContainers":true,"IngressNetworks":"[caddy]","DockerSockets":[""],"DockerCertsPath":[""],"DockerAPIsVersion":[""]}

This log shows all IDs and names of networks that CDP will use to connect to containers:

{"level":"info","ts":1703367233.8681114,"logger":"docker-proxy","msg":"IngressNetworksMap","ingres":"map[5cvx90pblvvp4qjqctwr5dtgk:true caddy:true]"}

The caddy network name that you chose and its network ID as well 5cvx90pblvvp4qjqctwr5dtgk. That ID matches your output from docker network inspect caddy. Which is good, CDP correctly found the network ID from the name you provided.

What to check when adding labels at service level

Docker swarm controls the creation of service containers via tasks, tasks are the Swarm concept of containers. CDP lists all running tasks of that service and gets their IPs in the ingress network.

When testing this scenario, can you please do:

docker service ps nc_nc

To see all tasks (containers) that are running for that service. You should have only 1 task, get its ID, and then run:

docker inspect PUT_TASK_ID_HERE

This will give you the task details, and at the end, there will be the network attachment information, which CDP uses to find upstream IPs for that service.

Screenshot 2023-12-24 at 00 51 38

  1. We have always supported comparing this information against the ingress network names and IDs CDP identified. But during my investigation, I noticed this ID never matches the actual network ID, so, this ID matching is kind of useless. But I'm keeping it in place, just in case this mismatch is not hapenning in all docker versions.

  2. There we have the network name, and this PR changes to also check that. In your test, the information in this field should be "caddy", can you please confirm if that's the case and share this section with me?

What to check when adding labels at container level

At container level, you can just inspect the container with:

docker inspect PUT_CONTAINER_ID_HERE

This will give you the container details, and at the end, there will be the network information that CDP uses to find upstream IPs.

Screenshot 2023-12-24 at 01 04 11

  1. That's the network name, in your case, it should be caddy, which would match the network name you chose as ingress. Can you please double-check and share this section with me?

  2. Here, the network ID works correctly, and it should match your network ID and the ID CDP found previously for the ingress network.

ID matching here seems to be working well based on your comments to: https://github.com/lucaslorentz/caddy-docker-proxy/issues/558 , but I'm adding NAME matching to containers as well, just in case.

Other remarks

You don't need label caddy_ingress_network, but it should work with it as well.

smaccona commented 6 months ago

@lucaslorentz This is really well laid-out. Sure, happy to provide all of those! I tore it all down and redid it exactly as before for a clean setup:

# docker service ps nc_nc
ID             NAME      IMAGE           NODE      DESIRED STATE   CURRENT STATE           ERROR     PORTS
m32bwyqzi73c   nc_nc.1   alpine:latest   bare14    Running         Running 2 seconds ago 

Looking for network information for the task:

# docker inspect m32bwyqzi73c
...
        "NetworksAttachments": [
            {
                "Network": {
                    "ID": "hfrxhk354r8naxa0k9c7bozbk",
                    "Version": {
                        "Index": 4629
                    },
                    "CreatedAt": "2023-12-24T03:37:47.692631237Z",
                    "UpdatedAt": "2023-12-24T03:37:47.69432266Z",
                    "Spec": {
                        "Name": "nc_default",
                        "Labels": {
                            "com.docker.stack.namespace": "nc"
                        },
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4105"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.8.0/24",
                                "Gateway": "10.0.8.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.8.3/24"
                ]
            }
        ]

... we see a difference from your screenshots. I am using docker stack deploy, so this is a Swarm deployment, but my task is not running on the host network, and it's also not the caddy network; instead, it's running on its own network.

Note also that this doesn't match the serviceId in the Caddy logs (maybe this doesn't matter because you were using a different setup than I am):

{"level":"warn","ts":1703390435.3127742,"logger":"docker-proxy","msg":"Service is not in same network as caddy","service":"nc_nc","serviceId":"pt8l8kdy190s74f6x6i9tps0o"}

Instead, this matches one of the entries I see when I do docker service ls (again maybe this doesn't matter):

# docker service ls
ID             NAME                  MODE         REPLICAS   IMAGE                              PORTS
...
pt8l8kdy190s   nc_nc                 replicated   1/1        alpine:latest      
...

Caddy seems to be seeing the overall service ID instead of the task ID? Is that because this is a replicated service instead of a global service?

If instead I do docker inspect pt8l8kdy190s, there is no NetworkAttachments section at all, just a Spec.TaskTemplate.Networks section like this:

                "Networks": [
                    {
                        "Target": "hfrxhk354r8naxa0k9c7bozbk",
                        "Aliases": [
                            "nc"
                        ]
                    }
                ],

... which does seem to target the other network.

Finally, I like that you divided it into sections "What to check when adding labels at service level" and "What to check when adding labels at container level", but I think it's impossible to add labels at the container level after container launch - let me know if you know of a way to do it!

Thanks again.

smaccona commented 6 months ago

@lucaslorentz not to distract you from my previous comment which includes logs which hopefully will be helpful, but one other point I wanted to make in case there is any confusion: I am not using netcat as a workaround or anything like that, I just switched to using netcat instead of my actual Java server to provide the simplest possible TCP listener which can replicate the situation I am encountering:

  1. Create a netcat service which listens on a single TCP port, don't explicitly tell Caddy to reverse proxy that service using layer4 upfront using labels in the docker-compose file
  2. Try to add a label to the service after it launches
  3. See if Caddy picks up the label correctly and is able to add a rule to its Caddyfile which starts reverse proxying the service correctly
smaccona commented 6 months ago

@lucaslorentz made some changes to my https://github.com/lucaslorentz/caddy-docker-proxy/pull/559#issuecomment-1868425388 comment - just to make sure you don't miss anything. Thank you!

lucaslorentz commented 6 months ago

... we see a difference from your screenshots. I am using docker stack deploy, so this is a Swarm deployment, but my task is not running on the host network, and it's also not the caddy network; instead, it's running on its own network.

Yeah, I did a different setup to take those screenshots, some compose yaml I had ready. I don't expect your task to be in host network, but I do expect it to be in caddy network. Strange that it is in nc_default network and not in caddy, this is the root cause of all your problems.

You could get it working by using label:

caddy_ingress_network: nc_default

But I suppose it would be useful for you to understand why your service is not in the network you set.

Let me do a quick test deploying the same YAML that you did.

docker network create caddy -d overlay
docker stack deploy -c nc.yaml nc

I do get the right network in the task:

"NetworksAttachments": [
            {
                "Network": {
                    "ID": "n0qiewxzny8mxbezc92yrrywj",
                    "Version": {
                        "Index": 11424
                    },
                    "CreatedAt": "2023-12-24T10:39:14.759698304Z",
                    "UpdatedAt": "2023-12-24T10:39:14.763106263Z",
                    "Spec": {
                        "Name": "caddy",
                        "Labels": {},
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "IPAMOptions": {
                            "Driver": {
                                "Name": "default"
                            }
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4104"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.8.0/24",
                                "Gateway": "10.0.8.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.8.3/24"
                ]
            }
        ]

So, this is basically what you need to figure out, why docker is creating a new network for your service instead of using caddy network. Could be that you have an old version of docker, or could be that you have some malformed yaml, or some special character in yaml.

Instead, this matches one of the entries I see when I do docker service ls (again maybe this doesn't matter):

Yeah, this is what it should match against, the service ID, not the task ID, looks good.

If instead I do docker inspect pt8l8kdy190s, there is no NetworkAttachments section at all, just a Spec.TaskTemplate.Networks section like this:

That's right, services are an abstract concept, a template about how to run containers and how many replicas you want, the service itself is not attached to a network, only the tasks/containers it generates are.

Finally, I like that you divided it into sections "What to check when adding labels at service level" and "What to check when adding labels at container level", but I think it's impossible to add labels at the container level after container launch - let me know if you know of a way to do it!

Yeah, no worries. We don't need to test that.

I am not using netcat as a workaround

Oh, got it, for some reason, I thought nc was just relaying traffic to your actual container, and it was a workaround. Ignore my comments about that then :-)

lucaslorentz commented 6 months ago

Turns out that in my latest test the network ID in tasks matched the real network ID. So, maybe we don't need this diff at all.

Network IDs in swarm tasks don't match the ID of the network, resulting in "Service is not in same network as caddy" and no upstreams. I changed it to match by network name as well.

This statement I did in the diff description was wrong.

smaccona commented 6 months ago

This is a pretty new bare metal server running Debian 11:

# docker -v
Docker version 20.10.5+dfsg1, build 55c4c88
# uname -a
Linux bare14 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux
# cat //etc/debian_version 
11.8

I'll have time to run some more tests in a couple of days.

smaccona commented 6 months ago

@lucaslorentz there must have been something weird with the network I was using, because when I created a new network to test with, Caddy was able to see the container's network correctly and direct traffic to it. But even though the mapping is now correct, Caddy is not able to contact the upstream ports, and is getting an I/O timeout when trying

Initially, I thought this was something weird with the setup on the specific server I was using to build out the test environment. So I provisioned a Debian 11 VM and replicated from scratch, and also provisioned a bare metal server at our cloud provider and replicated from scratch there too. I also built a very basic test case using containous/whoami, alpine (running netcat like I had above) and a custom build of Caddy adding in caddy-docker-proxy and caddy-ext/layer4. Instead of pasting YAML files inline here, I will drop them as attachments so you can see if there is anything weird about my YAML.

Here are the steps to reproduce.

First, provision a Debian 11 server - either VM or bare metal server, I was able to reproduce with both. To exclude potential VM issues, what I outline below is on the bare metal server. This has an internal IP address of 10.65.67.40 which we will use for testing below, and I also added an additional IP 192.168.10.1/32 on that interface (see below).

Install Docker, docker-compose and nmap, disable and stop apache2 so we can have Caddy bind to ports 80 and 443 in host mode, add a secondary private IP address that we can use to test exposing services directly on the host using host-mode networking, create the network we will use for Caddy testing, and initialize Swarm mode:

# apt update && apt install -y docker.io docker-compose nmap
# systemctl disable apache2
# systemctl stop apache2
# ip addr add 192.168.10.1/32 dev bond0
# docker swarm init --advertise-addr=10.65.67.40
# docker network create --driver overlay --attachable caddy_test

Next, we build the Caddy image we want. Here's the Dockerfile - I had to rename it to Dockerfile.txt because GitHub doesn't support files with no extension:

Dockerfile.txt

Build:

# docker build -t caddy-cdp-layer4 .

And here's the docker-compose file, which I am not running in Swarm mode but instead directly as services using docker-compose (again I added a .txt extension because GitHub doesn't support YAML files):

docker-compose.yaml.txt

Bring up the stack:

# docker-compose up -d

Check Caddy logs and I see that the network and mappings are created correctly:

...
{"level":"info","ts":1703909185.385052,"logger":"docker-proxy","msg":"Start","CaddyfilePath":"","EnvFile":"","LabelPrefix":"caddy","PollingInterval":30,"ProxyServiceTasks":true,"ProcessCaddyfile":true,"ScanStoppedContainers":true,"IngressNetworks":"[caddy_test]","DockerSockets":[""],"DockerCertsPath":[""],"DockerAPIsVersion":[""]}
...
{"level":"info","ts":1703909185.387031,"logger":"docker-proxy","msg":"IngressNetworksMap","ingres":"map[caddy_test:true gd04bq2d59eoaah6m5eu9jaz2:true]"}
...
{"level":"info","ts":1703908852.9633453,"logger":"docker-proxy","msg":"New Caddyfile","caddyfile":"{\n\tlayer4 {\n\t\tnc_test:8080 {\n\t\t\tproxy {\n\t\t\t\tto 10.0.1.13:8080\n\t\t\t}\n\t\t}\n\t}\n}\nwhoami.example.com {\n\treverse_proxy 10.0.1.11:80\n\ttls internal\n}\n"}
{"level":"info","ts":1703908852.9642181,"logger":"docker-proxy","msg":"New Config JSON","json":"{\"apps\":{\"http\":{\"servers\":{\"srv0\":{\"listen\":[\":443\"],\"routes\":[{\"match\":[{\"host\":[\"whoami.example.com\"]}],\"handle\":[{\"handler\":\"subroute\",\"routes\":[{\"handle\":[{\"handler\":\"reverse_proxy\",\"upstreams\":[{\"dial\":\"10.0.1.11:80\"}]}]}]}],\"terminal\":true}]}}},\"layer4\":{\"servers\":{\"srv0\":{\"listen\":[\"nc_test:8080\"],\"routes\":[{\"handle\":[{\"handler\":\"proxy\",\"upstreams\":[{\"dial\":[\"10.0.1.13:8080\"]}]}]}]}}},\"tls\":{\"automation\":{\"policies\":[{\"subjects\":[\"whoami.example.com\"],\"issuers\":[{\"module\":\"internal\"}]}]}}}}"}
{"level":"info","ts":1703908856.346587,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}

Expanding:

{
    layer4 {
        nc_test:8080 {
            proxy {
                to 10.0.1.14:8080
            }
        }
    }
}
whoami.example.com {
    reverse_proxy 10.0.1.16:80
    tls internal
}

Docker containers:

# docker ps -a
CONTAINER ID   IMAGE               COMMAND                  CREATED         STATUS         PORTS     NAMES
d75fe827c8df   alpine              "nc -lvnp 8080"          2 minutes ago   Up 2 minutes             basic_nc_1
cf52477418dc   caddy-cdp-layer4    "/bin/caddy docker-p…"   2 minutes ago   Up 2 minutes             caddy
54ee27bde496   containous/whoami   "/whoami"                2 minutes ago   Up 2 minutes   80/tcp    basic_whoami_1
# docker inspect basic_nc_1
...
"Networks": {
                "caddy_test": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.1.14"
                    },
                    "Links": null,
                    "Aliases": [
                        "nc",
                        "d75fe827c8df"
                    ],
                    "NetworkID": "gd04bq2d59eoaah6m5eu9jaz2",
                    "EndpointID": "c48ea43277e58f5f25e18b178d728a32dea49253f81736af4cd1caaaf2372f72",
                    "Gateway": "",
                    "IPAddress": "10.0.1.14",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:01:0e",
                    "DriverOpts": null
                }
            }
...
# docker inspect basic_whoami_1
...
            "Networks": {
                "caddy_test": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.1.16"
                    },
                    "Links": null,
                    "Aliases": [
                        "54ee27bde496",
                        "whoami"
                    ],
                    "NetworkID": "gd04bq2d59eoaah6m5eu9jaz2",
                    "EndpointID": "f5dfe5b82f136e747c1fd52c950ad78ba7fd9020f451c4f5723ccabb88a90626",
                    "Gateway": "",
                    "IPAddress": "10.0.1.16",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:01:10",
                    "DriverOpts": null
                }
            }
        }
...

nmap output:

# nmap 192.168.10.1 -p 8080
Starting Nmap 7.80 ( https://nmap.org ) at 2023-12-29 22:14 CST
Nmap scan report for 192.168.10.1
Host is up (0.000071s latency).

PORT     STATE SERVICE
8080/tcp open  http-proxy

Nmap done: 1 IP address (1 host up) scanned in 0.10 seconds
# nmap 10.65.67.40
Starting Nmap 7.80 ( https://nmap.org ) at 2023-12-29 22:15 CST
Nmap scan report for 10.65.67.40
Host is up (0.0000060s latency).
Not shown: 996 closed ports
PORT    STATE SERVICE
22/tcp  open  ssh
53/tcp  open  domain
80/tcp  open  http
443/tcp open  https

Nmap done: 1 IP address (1 host up) scanned in 0.13 seconds

Now the tests. First, the web service on whoami.example.com:

# time curl --show-error -s -k -f --resolve whoami.example.com:443:127.0.0.1 https://whoami.example.com
curl: (22) The requested URL returned error: 502 

real    0m3.034s
user    0m0.017s
sys 0m0.013s

And in the Caddy logs:

{"level":"error","ts":1703910004.0565722,"logger":"http.log.error","msg":"dial tcp 10.0.1.16:80: i/o timeout","request":{"remote_ip":"127.0.0.1","remote_port":"37780","client_ip":"127.0.0.1","proto":"HTTP/2.0","method":"GET","host":"whoami.example.com","uri":"/","headers":{"Accept":["*/*"],"User-Agent":["curl/7.74.0"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"whoami.example.com"}},"duration":3.002708091,"status":502,"err_id":"gmwtimxa7","err_trace":"reverseproxy.statusError (reverseproxy.go:1267)"}

Now the netcat container. In one session, I follow the logs:

# docker logs -f basic_nc_1
listening on [::]:8080 ...

In a second sesión, I send data to port 8080 on 192.168.10.1:

# time echo -n 'Line of text' | nc 192.168.10.1 8080

real    2m9.823s
user    0m0.001s
sys 0m0.004s

In the first session, I observed there is no log activity in the basic_nc_1 container. And in the Caddy logs:

{"level":"error","ts":1703910352.5222254,"logger":"layer4","msg":"handling connection","error":"dial tcp 10.0.1.14:8080: connect: connection timed out"}

So it has the correct IP addresses for each service, but it times out connecting to the upstream services.

I'm not sure of the next steps to troubleshoot this. If you have any ideas that would be great. Thanks!

lucaslorentz commented 6 months ago

First thing that comes to my mind is that an overlay network is a swarm network, but you're deploying things with docker-compose. I would try a bridge network driver instead of overlay.

I will try to follow the same steps in a Debian VM.

lucaslorentz commented 6 months ago

@smaccona Just tried with the same steps you did. Got the same 502 error. Changed to bridge network and both whoami and nc worked.

Edit: You should be able to drop docker swarm init after switching to bridge network. Closed this PR, let's move the discussion back to #558. Sorry for the back and forth.

lucaslorentz commented 6 months ago

Closing this as it is not needed. Maybe I will open a new PR later just to add support for HOST as ingress network and using 127.0.0.1 to reach containers in host network.