Closed al-sabr closed 7 years ago
I tried the setup again on my 5 node rpi 3 model B cluster without any problems but the rpi version of hypriotOS has all network related features available only some storage driver are missing.
I think the odroid version needs to enable CONFIG_IPVLAN
which is missing the listing of @gdeverlant
As a reminder of what I did before.
sudo apt-get install -y bc curl gcc git libncurses5-dev lzop make
git clone --depth 1 --single-branch -b odroidc-3.10.y https://github.com/hardkernel/linux
cd linux
make odroidc_defconfig
sed -ie 's/# CONFIG_VXLAN is not set/CONFIG_VXLAN=m/g' .config
make -j 4 uImage dtbs modules
sudo cp arch/arm/boot/uImage arch/arm/boot/dts/*.dtb /boot
sudo make modules_install
sudo make firmware_install
sudo make headers_install INSTALL_HDR_PATH=/usr
kver=`make kernelrelease`
sudo cp .config /boot/config-${kver}
cd /boot
sudo update-initramfs -c -k ${kver}
sudo mkimage -A arm -O linux -T ramdisk -a 0x0 -e 0x0 -n initrd.img-${kver} -d initrd.img-${kver} uInitrd-${kver}
sudo cp uInitrd-${kver} /boot/uInitrd
I don't know if this is relevant to this problem maybe you guys can have a look
Ahhh crap it didn't change anything :(
This is the furthest i was able to go !
This Docker feature requires Kernel Vendor libnetwork v0.7.0-dev.7 : Experimental MacVlan and IPVlan network drivers https://github.com/moby/moby/pull/21122
Warning I'm not even sure that I understand correctly my last claim .... or if it is connected with this bug : D
I found this github repo from @umiddelb https://github.com/umiddelb/armhf/wiki/How-To-compile-a-custom-Linux-kernel-for-your-ARM-device#odroid-c1-mainline-experimental
Tried the following steps for ODROID-C1 mainline (experimental!)
$ curl -sSL https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.8.tar.xz | unxz | tar -xvf -
$ cd linux
$ make multi_v7_defconfig
and decided to compare what is in the config and what is missing and found at the following results: compared with my odroid c1 ./check-config.sh
======================================= missing but should be there
CONFIG_NET_NS
CONFIG_PID_NS
CONFIG_IPC_NS
CONFIG_UTS_NS
CONFIG_BRIDGE_NETFILTER
CONFIG_NF_NAT_IPV4
CONFIG_IP_NF_FILTER
CONFIG_IP_NF_TARGET_MASQUERADE
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE
CONFIG_NETFILTER_XT_MATCH_CONNTRACK
CONFIG_NETFILTER_XT_MATCH_IPVS
CONFIG_IP_NF_NAT
CONFIG_NF_NAT
CONFIG_NF_NAT_NEEDED
CONFIG_DEVPTS_MULTIPLE_INSTANCES
CONFIG_USER_NS
CONFIG_MEMCG_SWAP
CONFIG_MEMCG_SWAP_ENABLED
CONFIG_RESOURCE_COUNTERS
CONFIG_NET_CLS_CGROUP
CONFIG_CFS_BANDWIDTH
CONFIG_FAIR_GROUP_SCHED
CONFIG_RT_GROUP_SCHED
CONFIG_IP_VS
CONFIG_IP_VS_NFCT
CONFIG_IP_VS_RR
- "ftp,tftp client in container":
CONFIG_NF_NAT_FTP
CONFIG_NF_CONNTRACK_FTP
CONFIG_NF_NAT_TFTP
CONFIG_NF_CONNTRACK_TFTP
CONFIG_AUFS_FS
CONFIG_BTRFS_FS
CONFIG_BTRFS_FS_POSIX_ACL
CONFIG_BLK_DEV_DM
CONFIG_DM_THIN_PROVISIONING:
======================================= added manually
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_SCHED=y
CONFIG_CPUSETS=y
CONFIG_MEMCG=y
CONFIG_VETH=y
CONFIG_BRIDGE=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_PERF=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_IPVLAN=m
CONFIG_VXLAN=m
CONFIG_XFRM_USER=y
CONFIG_INET_ESP=y
======================================= added but not necessary
CONFIG_EXT4_ENCRYPTION=y
CONFIG_EXT4_DEBUG=y
CONFIG_DUMMY=m
IMHO, you need to add drone-net to traefik, otherwise it can't route traffic there...
Sorry I've updated the description to the actual latest correct version my bad. They are in the same overlay network traefik-net as you can see. There is no drone-net :D
This is what is running right now in my cluster.
I feel this must be related to your config, since I am running traefik, consul, and node-red in a similar fashion on my C2s. Unless it's related to the rather old kernel for the C1...
Do you think you could simplify your setup to
?
I've tried the whoami setup with 10 replicas on the 10 odroid c2 nodes and traefik was seeing the service but docker could not give traefik access to the overlay network. Ok let me simplify it as you said.
Do you also have a cluster or are you running everything on 1 node?
Can you also post the output of docker network inspect traefik-net
?
The manager node is an Odroid C1
docker network inspect traefik-net
[
{
"Name": "traefik-net",
"Id": "fvbi2v9rit8yfj1ij113ks9om",
"Created": "2017-06-05T10:38:18.527963864+02:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"Containers": {
"9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
"Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
"EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "bambuserver1-5a1d9d8fe251",
"IP": "192.168.1.3"
}
]
}
]
yaml whoami
The whoami service is running on the Odroic C2 nodes.
version: "3"
networks:
traefik-net:
external: true
services:
whoami:
image: admiralobvious/whoami-aarch64
networks:
- traefik-net
deploy:
replicas: 5
labels:
traefik.port: "80"
traefik.enable: "true"
traefik.backend.loadbalancer.sticky: "true"
traefik.backend.loadbalancer.method: "wrr"
#traefik.backend.loadbalancer.swarm: "true"
traefik.frontend.passHostHeader: "true"
traefik.docker.network: "traefik-net"
traefik.frontend.rule: "Host:whoami.cluster.publicvm.com"
placement:
constraints:
- node.labels.arch==arm64
$ docker network inspect traefik-net
[
{
"Name": "traefik-net",
"Id": "fvbi2v9rit8yfj1ij113ks9om",
"Created": "2017-06-05T10:38:18.527963864+02:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"Containers": {
"9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
"Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
"EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "bambuserver1-5a1d9d8fe251",
"IP": "192.168.1.3"
}
]
}
]
yaml Portainer
version: "3"
networks:
traefik-net:
external: true
services:
portainer:
depends_on: [ traefik ]
image: portainer/portainer:linux-arm-1.13.2
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- portainer-datas:/data
networks:
- traefik-net
deploy:
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
labels:
traefik.docker.network: "traefik-net"
traefik.port: "9000"
traefik.frontend.rule: "Host:portainer.cluster.publicvm.com"
volumes:
portainer-datas:
driver: local-persist
driver_opts:
type: volume
mountpoint: /mnt/virtual/docker/containers/portainer
$ docker network inspect traefik-net
[
{
"Name": "traefik-net",
"Id": "fvbi2v9rit8yfj1ij113ks9om",
"Created": "2017-06-05T10:38:18.527963864+02:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"Containers": {
"0de2015679c7376c41a99cd57bbf380ec8b4be41593bc8642c39bc2ee9f86f3a": {
"Name": "portainer_portainer.1.pwla0ki3quxaq3kz2guvjt2jq",
"EndpointID": "9ad492667b6f5e92c8424bcd9ea5910e99b7d8b06e7b85226b2cdcdb0e66ea89",
"MacAddress": "02:42:0a:00:00:0b",
"IPv4Address": "10.0.0.11/24",
"IPv6Address": ""
},
"9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
"Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
"EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "bambuserver1-5a1d9d8fe251",
"IP": "192.168.1.3"
}
]
}
]
As you can see everything outside the manager node is not accessible by Traefik
time="2017-06-05T09:18:00Z" level=info msg="Skipping same configuration for provider docker"
time="2017-06-05T09:18:02Z" level=warning msg="Error forwarding to http://10.0.0.9:80, err: dial tcp 10.0.0.9:80: getsockopt: no route to host"
time="2017-06-05T09:18:02Z" level=debug msg="Round trip: http://10.0.0.11:9000, code: 200, duration: 821.781684ms"
Weird. I'll have another look in the evening.
So in your opinion do you think that my analysis is correct with this case being a bug ?
I've filed a ticket in moby and someone asked me to do some basic tests without Traefik in the equation.
You can read my process here https://github.com/moby/moby/issues/33531
I've just tested with only 2 Odroid C2 devices
It works without any problem. It seems that the bug is in the Odroid C1 build with Docker.
Closing this issue reason to old kernel missing features.
This problem I'm having might be the same reason and related to this old kernel bug we found and fixed.
Further investigation about this ticket and possible link with old closed ticket https://github.com/hypriot/image-builder-odroid-c1/issues/38
@docbobo I would like to have your perspective on this problem.
This is a brief description of my setup.
Running Docke Swarm cluster
(2) Odroid C1 armhfv7(arm32) servers
(10) Odroid C2 aarch64(arm64) servers
Docker version arm32
Docker version arm64
docker info arm32
docker info arm64
Check config arm32
Check config arm64
==========================================
Steps to reproduce the problem...
docker network create -d overlay traefik-net
create a docker-compose.yml for initial stack
What is expected as result: Go in your browser and all calls to the url http://drone.cluster.publicvm.com/ should go through Traefik and then routed to the drone_server service on bambuserver2 which is a worker node. The login page from drone should be displayed.
What is actually happening:
Traefik is not able to go out of the Swarm Manager and route the request to bambuserver2. Instead this error message is shown in the logs:
I've talked with @emilevauge from Traefik and he told me that this is not a Traefik bug but rather a docker bug. Then @firecyberice had tried my setup on amd64 architecture via playwithdocker and he was able to run my config without a problem. Based on his results and tests I've come up with the conclusion that some other disabled kernel options like - CONFIG_VXLAN: missing in ticket https://github.com/hypriot/image-builder-odroid-c1/issues/38 are blocking the routing within docker and hypriotOS
I'm not 100% sure about this but I would say 90% close to the root of the problem. Since I don't know which other kernel options are necessary so to successfully route overlay networks packets between nodes.
@docbobo @firecyberice and maybe others can help us on this bug.
Screenshots of the networks in my Swarm Manager
Screenshots of the networks in worker node 1 (bambuserver2)
Traefik redirecting successfully the 3 services in inital stack (please look at urls and traefik tab)
Traefik showing error for the 4th service on bambuserver2
What is inside the log from Traefik