newsnowlabs / docker-ingress-routing-daemon

Docker swarm daemon that modifies ingress mesh routing to expose true client IPs to service containers
MIT License
161 stars 31 forks source link

Totally not working... #34

Closed cytown closed 7 months ago

cytown commented 8 months ago

My ingress network is 10.0.0.0/24, and gw is 10.0.0.1.

Then I run this command: docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.1 --install --services proxy_proxy

Next, scale service proxy_proxy to 1, and check, found it still report 10.0.0.2 to access log...

If I run: docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.2 --install --services proxy_proxy

It will freeze all docker process...

Anything wrong? Or is it a issue?

 docker version
Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:51:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:48:42 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
struanb commented 8 months ago

Hi @cytown thanks for trying DIRD. Your first command is certainly not correct, as 10.0.0.1 is not an IP that Docker will choose for load balancing. Your second command may be correct, but to confirm this you should run ./docker-ingress-routing-daemon without any arguments on your load balancer node(s) and see what IP(s) are printed.

Also, after that, if you do not use the --preexisting option you must scale down your service to zero and scale it up again after launching DIRD; otherwise you will experience a freeze on traffic to pre-existing containers.

Please let me know if this helps.

cytown commented 8 months ago

Hi @cytown thanks for trying DIRD. Your first command is certainly not correct, as 10.0.0.1 is not an IP that Docker will choose for load balancing. Your second command may be correct, but to confirm this you should run ./docker-ingress-routing-daemon without any arguments on your load balancer node(s) and see what IP(s) are printed.

Also, after that, if you do not use the --preexisting option you must scale down your service to zero and scale it up again after launching DIRD; otherwise you will experience a freeze on traffic to pre-existing containers.

Please let me know if this helps.

Hi @struanb, for your reference, the command without any arguments just return:

Detected ingress subnet and node IP:
- Ingress subnet: 10.0.0.0/24
- This node's ingress network IP: 10.0.0.2

As I mentioned before, if I use 10.0.0.2 in arguments, it will freeze all docker process...

struanb commented 8 months ago

Ok thanks for confirming, but did you also try --preexisting?

Also, after that, if you do not use the --preexisting option you must scale down your service to zero and scale it up again after launching DIRD; otherwise you will experience a freeze on traffic to pre-existing containers.

cytown commented 8 months ago

Ok thanks for confirming, but did you also try --preexisting?

Also, after that, if you do not use the --preexisting option you must scale down your service to zero and scale it up again after launching DIRD; otherwise you will experience a freeze on traffic to pre-existing containers.

Hi @struanb , yes, specified 10.0.0.2 will freeze everything in docker, even with preexisting argument.

struanb commented 8 months ago

Ok thanks for confirming. I need to know more about your network now. Please forgive the large number of questions. They're essential to understanding your setup.

How many nodes are in your swarm? Please provide names (or pseudonyms) and ingress network IP for each, so we can refer to these nodes.

Which nodes are your service containers running on? Just the node with the 10.0.0.2 IP, or any others?

What services are you running apart from proxy_proxy? If any, do any of these services also publish on any ports?

Are you experiencing the freeze only on the proxy_proxy service or on any other services?

Please note, if you have more than one node, then you need to run DIRD on every node (at least those running your service containers).

Also if you are accessing your service through more than one load balancer node, you also need to run DIRD on all those nodes too, and the command line you run should reflect the IPs of all load balancer nodes (not just 10.0.0.2) consistently.

Looking forward to your response.

cytown commented 8 months ago

Ok thanks for confirming. I need to know more about your network now. Please forgive the large number of questions. They're essential to understanding your setup.

How many nodes are in your swarm? Please provide names (or pseudonyms) and ingress network IP for each, so we can refer to these nodes.

only 1 node:

# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2aywbykgaaou2cwgcg3talej1 *   xxxxx      Ready     Active         Leader           23.0.1

Which nodes are your service containers running on? Just the node with the 10.0.0.2 IP, or any others?

What services are you running apart from proxy_proxy? If any, do any of these services also publish on any ports?

I have portainer swarm installed, the version is: portainer/portainer-ce:2.17.1 & portainer/agent:2.17.1, and postgresql, redis, pgadmin.

Are you experiencing the freeze only on the proxy_proxy service or on any other services?

portainer and pgadmin were freezed, not test others.

Please note, if you have more than one node, then you need to run DIRD on every node (at least those running your service containers).

Also if you are accessing your service through more than one load balancer node, you also need to run DIRD on all those nodes too, and the command line you run should reflect the IPs of all load balancer nodes (not just 10.0.0.2) consistently.

Looking forward to your response.

Please check the above answers.

cytown commented 7 months ago

@struanb any progress???

struanb commented 7 months ago

Apologies @cytown I didn't receive GitHub's alert to your earlier comment only this last one.

Can you try adding the --tcp-ports <ports> argument to the DIRD command line, replacing <ports> with the port(s) published by the proxy_proxy service.

If that doesn't help, please also supply full list of ports published by your services, ie output of docker service ls and docker ps as this is detail I'm still missing in understanding your setup.

Thanks.

cytown commented 7 months ago

@struanb Thank you so much for this, it really works!!! Adding the tcp-ports argument did send the real ip to service and without infect other services.

Thanks for your such great project again.

struanb commented 7 months ago

That's great news! I'm very glad we've been able to sort this.

It seems the documentation is in definite need of update to clarify the need for these extra arguments in heterogeneous setups like yours. I'm going to leave this issue open for now until that update is done.

struanb commented 7 months ago

I've updated the language in the README, which I hope you agree is clearer about the whitelisting options, and will now close this issue.

cytown commented 7 months ago

@struanb Found another issue:(

When I use more than one node, the load balance seems freeze or broken, only the node which point to will work.

It means: 10.0.0.2-10.0.0.6 is the ingress ip of each node, the proxy service running on 10.0.0.2 and 10.0.0.3, the firewall directed all http request to 10.0.0.3, when I make 10.0.0.2 and 10.0.0.3 running:

docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.x --install --services gly-proxy_proxy --tcp-ports 80,443 --preexisting

Then the ingress load balancer seems wrong, and when I visit the url, it will freeze all traffic to 10.0.0.2, and works fine for 10.0.0.3...

If I visit 10.0.0.4:80, the balancer will work just fine, but the client ip will be 10.0.0.4.

Any idea for this???

struanb commented 7 months ago

It looks like there's an x in your IP list.

Based on the ingress IPs you've listed, I think you should probably be running the following command, and please make sure you run it on every node:

docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 --install --services gly-proxy_proxy --tcp-ports 80,443 --preexisting

(You may need to run docker-ingress-routing-daemon --uninstall first).

cytown commented 7 months ago

It looks like there's an x in your IP list.

Based on the ingress IPs you've listed, I think you should probably be running the following command, and please make sure you run it on every node:

docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 --install --services gly-proxy_proxy --tcp-ports 80,443 --preexisting

(You may need to run docker-ingress-routing-daemon --uninstall first).

I have tried this command for 10.0.0.2-10.0.0.4, and still the same issue, and direct visit 10.0.0.4 will balance to 2 and 3 with visit ip as 10.0.0.4.

I doubt run command for 5-6 will help...

Anyway, I will try this later.

cytown commented 7 months ago

It looks like there's an x in your IP list.

Based on the ingress IPs you've listed, I think you should probably be running the following command, and please make sure you run it on every node:

docker-ingress-routing-daemon --ingress-gateway-ips 10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 --install --services gly-proxy_proxy --tcp-ports 80,443 --preexisting

(You may need to run docker-ingress-routing-daemon --uninstall first).

It works like charm!!! Thank you.