Open phish108 opened 2 years ago
this hint did not lead anywhere.
a similar discussion was also a dead end.
The following calls work externally, but not internally
docker service create --name web --publish mode=host,target=80,published=80 --mode=global nginx:alpine
This exposes nginx on all hosts in the swarm.
docker service create --name web --publish mode=host,target=80,published=80 nginx:alpine
This exposes only the instance running on one (random) node in the swarm.
The service is in both cases reachable through the published port(s). However, the service is internally still unavailable via the overlay network. This is opposite to what the documentation suggests.
OK found a "solution" here. It boils down to a problem with the VIP mode for service detection.
The default endpoint mode (vip
) is not reliably working on local networks in LXC.
if the endpoint-mode is set to dnsrr
everything works smoothly:
docker service create --name web \
--publish mode=host,target=80,published=80 \
--mode=global \
--network mynet \
--endpoint-mode dnsrr \
nginx:alpine
Any proxy service has to run as global service. Alternatively, we may force a single instance of the service to the service. This is of course suboptimal.
Found an interesting but not very technically enlightening article on vip vs dnsrr mode.
Found the truely important article on the virtual IP kernel modules that put me on the right track. It helped me to understand the long list of kernel modules I found without comment in an earlier post.
The iptable*
, and xt*
modules as well all network terminology modules are required for handling the external (incoming) traffic.
The ip_vs*
are needed for handling virtual IPs in docker swarm.
I use the following profile:
- name: docker
config:
# the security settings are needed for docker
security.nesting: true
security.syscalls.intercept.mknod: true
security.syscalls.intercept.setxattr: true
linux.kernel_modules: bridge,ip_tables,ip6_tables,iptable_nat,iptable_mangle,netlink_diag,nf_nat,overlay,br_netfilter,bonding,ip_vs,ip_vs_dh,ip_vs_ftp,ip_vs_lblc,ip_vs_lblcr,ip_vs_lc,ip_vs_nq,ip_vs_rr,ip_vs_sed,ip_vs_sh,ip_vs_wlc,ip_vs_wrr,xfrm_user,xt_conntrack,xt_MASQUERADE
# limit the memory and cpu resources
limits.memory: 16GB
limits.memory.swap: false
limits.cpu: 2
description: "Universal Docker Swarm Configuration"
devices:
eth0:
nictype: bridged
name: eth0
parent: ovs0
type: nic
root:
path: /
pool: docker
type: disk
With it the following command works as expected.
docker service create --name web -p 80:80 --network mynet nginx:alpine
I can access the service, both, internally and externally from all nodes.
New insights: https://remotephone.github.io/posts/Docker-Swarm-in-LXC_Part-1/
It seems ubuntus appamor config gets into the way. After updating the swarm nodes no longer work.
Ensure proper app armor via ansible!
Docker swarm has notorious problems in the container.
I did the following:
Still need to activate low level lxc options.
The overlay networking works between manager nodes of the swarm, but the networks are not exposed to worker nodes of the swarm.
I tested the setup using nginx:alpine
Things to test:
Links that helped:
Tests I ran
Preparations:
On all Nodes run:
This should show all swarm networks. Interestingly, on manager nodes this works. On worker nodes only local networks are shown. It seems that network information is not passed to worker nodes. If a node is promoted, it will reoprt the networks.
access the internet a basic container
expose container to the Internet
In node A
On some computer
Curl should show the nginx test page. The container on nodeA should report a request
Repeat the trick in a different node. First ping the container.
And if this works then check the service.
Both will work.
create local service (on manager node)
Then on some other node:
Will not work.
Get the container name using
To get the node on which the service container runs.
On that node:
This gives the container name. Copy the name.
Then on some other node
Will work 🤨
Stop the service with
docker service rm webB
.check exposed service
On some computer
Will cause a connection refused error.
All other checks will behave as before.