hypriot / image-builder-odroid-c1

Build SD card image for ODROID C1 and C1+
http://blog.hypriot.com/post/how-to-get-docker-working-on-your-favourite-arm-board-with-hypriotos/
MIT License
15 stars 6 forks source link

Docker with Traefik running inside swarm manager as a service is not able to redirect requests to services that are in the same overlay network outside from the manager to other service in worker nodes. #42

Closed al-sabr closed 7 years ago

al-sabr commented 7 years ago

This problem I'm having might be the same reason and related to this old kernel bug we found and fixed.

Further investigation about this ticket and possible link with old closed ticket https://github.com/hypriot/image-builder-odroid-c1/issues/38

@docbobo I would like to have your perspective on this problem.

This is a brief description of my setup.

Running Docke Swarm cluster

(2) Odroid C1 armhfv7(arm32) servers

  1. Swarm Manager (initial stack docker-compose.yml)
  2. Worker node (drone stack docker-compose-yml)

(10) Odroid C2 aarch64(arm64) servers

Docker version arm32

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:28:23 2017
 OS/Arch:      linux/arm

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:28:23 2017
 OS/Arch:      linux/arm
 Experimental: false

$ uname -a
Linux bambuserver2 3.10.104 #1 SMP PREEMPT Sun Jun 4 07:54:52 UTC 2017 armv7l GNU/Linux

Docker version arm64

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658bed6
 Built:        Tue May  9 07:22:23 2017
 OS/Arch:      linux/arm64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658bed6
 Built:        Tue May  9 07:22:23 2017
 OS/Arch:      linux/arm64
 Experimental: false

$ uname -a
Linux bambuserver12 3.14.79-109 #1 SMP PREEMPT Thu Mar 16 20:05:25 BRT 2017 aarch64 GNU/Linux

docker info arm32

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 414
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /mnt/virtual/var/lib/docker/aufs
 Backing Filesystem: <unknown>
 Dirs: 561
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local local-persist
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: m7uvwoo1s1335vy20evjz9752
 Is Manager: true
 ClusterID: v1wra9jgbzas12b639h5oc5fm
 Managers: 1
 Nodes: 12
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.1.3
 Manager Addresses:
  192.168.1.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 3.10.104
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 940.9MiB
Name: bambuserver1
ID: 7GHE:CHRG:TDC4:UOTO:3JWM:2ZYU:CHBN:AMIE:W45Y:I5G7:AMSK:ETMY
Docker Root Dir: /mnt/virtual/var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No kernel memory limit support

docker info arm64

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /mnt/virtual/var/lib/docker/aufs
 Backing Filesystem: <unknown>
 Dirs: 8
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local local-persist
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: apddjnzk1njxiqwlm5l8pan0h
 Is Manager: false
 Node Address: 192.168.1.14
 Manager Addresses:
  192.168.1.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 3.14.79-109
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 1.928GiB
Name: bambuserver12
ID: B64M:RE77:NE6P:CZWV:7XNT:DQ3B:SVYF:LDL5:JHVX:VY2V:GFEB:J7CV
Docker Root Dir: /mnt/virtual/var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Check config arm32

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: missing
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_NETPRIO_CGROUP: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: missing
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: missing
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Check config arm64

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: enabled
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

==========================================

Steps to reproduce the problem...

  1. docker network create -d overlay traefik-net

  2. create a docker-compose.yml for initial stack

version: "3"

networks:
  traefik-net:
    external: true 

services:
  traefik:
    image: hypriot/rpi-traefik
    ports:
      - "80:80"
      - "443:443"
      #- "8080:8080"
    command: --web --docker --docker.swarmmode=true --docker.watch=true --docker.domain=cluster.publicvm.com -l DEBUG 
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
      - traefik-net
    deploy:
      placement:
        constraints:
          - node.role==manager
          #- node.hostname==bambuserver1
      restart_policy:
        condition: on-failure
      labels:
        traefik.docker: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: 8080
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:traefik-admin.cluster.publicvm.com"

  portainer:
    depends_on: [ traefik ]
    image: portainer/portainer:linux-arm-1.13.1
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - portainer-datas:/data
    networks:
      - traefik-net
    #ports:
      #- 9000:9000
    deploy:
      placement:
        constraints:
          - node.role == manager
          #- node.hostname==bambuserver1
      restart_policy:
        condition: on-failure
      labels:
        traefik.enable: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: "9000"
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:portainer.cluster.publicvm.com"      

  gitea:
    depends_on: [ portainer ]
    image: bambuserver1:5000/ergu/gitea-arm:1.1.1
    volumes:
      - gitea-datas:/data
    networks:
      - traefik-net
    ports:
      - 3022:22
      #- 3000:3000
    deploy:
      placement:
        constraints:
          - node.role==manager
          #- node.hostname == bambuserver2
      restart_policy:
        condition: on-failure
      labels:
        traefik.enable: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: "3000"
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:gitea.cluster.publicvm.com"

volumes:
  portainer-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/portainer
  gitea-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/gitea    
  1. create a docker-compose.yml for drone stack
version: "3.2"

networks:
  traefik-net:
    external: true 

services:
    server:
        image: bambuserver1:5000/ergu/drone-arm32:0.7.1
        networks:
          - traefik-net
        deploy:
            labels:
                traefik.enable: "true"
                traefik.docker.network: "traefik-net"
                traefik.port: "8000"
                traefik.backend.loadbalancer.sticky: "true"
                traefik.backend.loadbalancer.method: "drr"
                traefik.backend.loadbalancer.swarm: "true"
                traefik.frontend.passHostHeader: "true"
                traefik.frontend.rule: "Host:drone.cluster.publicvm.com"
            placement:
                constraints:
                - node.hostname==bambuserver2
        ports:
          - 8000:8000

        volumes:
            - drone-datas:/var/lib/drone
        environment:
          - DRONE_ADMIN=administrator
          - DRONE_DEBUG=true
          - DRONE_OPEN=false
          - DRONE_HOST=http://server:8000/
          - DRONE_SECRET=${DRONE_SECRET}
          - DRONE_SERVER_PORT=${DRONE_SERVER_PORT}
          - DRONE_GOGS=true
          - DRONE_GOGS_URL=http://gitea.cluster.publicvm.com/
          - DRONE_GOGS_SKIP_VERIFY=true
          - DRONE_PLUGIN_PRIVILEGED=armhfplugins/docker,armhfplugins/drone-docker

    agent:
        depends_on: [ server ]
        image: bambuserver1:5000/ergu/drone-arm32:0.7.1
        networks:
          - traefik-net
        command: agent
        deploy:
            placement:
                constraints:
                - node.hostname==bambuserver2
        volumes: [ "/var/run/docker.sock:/var/run/docker.sock" ]
        environment:
          - DRONE_SERVER=ws://server:8000/ws/broker
          - DRONE_SECRET=${DRONE_SECRET}
          - DRONE_DEBUG=true

volumes:              
  drone-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/volumes/drone/      
  1. Run the initial stack docker-compose with docker stack deploy on the manager node
  2. Run the drone stack docker-compose with docker stack deploy on the manager node

What is expected as result: Go in your browser and all calls to the url http://drone.cluster.publicvm.com/ should go through Traefik and then routed to the drone_server service on bambuserver2 which is a worker node. The login page from drone should be displayed.

What is actually happening:

Traefik is not able to go out of the Swarm Manager and route the request to bambuserver2. Instead this error message is shown in the logs:

time="2017-06-04T04:12:55Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 1.712028ms" 
time="2017-06-04T04:12:56Z" level=warning msg="Error forwarding to http://10.0.0.9:8000, err: dial tcp 10.0.0.9:8000: i/o timeout" 
time="2017-06-04T04:12:57Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 2.247036ms" 

I've talked with @emilevauge from Traefik and he told me that this is not a Traefik bug but rather a docker bug. Then @firecyberice had tried my setup on amd64 architecture via playwithdocker and he was able to run my config without a problem. Based on his results and tests I've come up with the conclusion that some other disabled kernel options like - CONFIG_VXLAN: missing in ticket https://github.com/hypriot/image-builder-odroid-c1/issues/38 are blocking the routing within docker and hypriotOS

I'm not 100% sure about this but I would say 90% close to the root of the problem. Since I don't know which other kernel options are necessary so to successfully route overlay networks packets between nodes.

@docbobo @firecyberice and maybe others can help us on this bug.

Screenshots of the networks in my Swarm Manager

traefik-net (overlay)

Screenshots of the networks in worker node 1 (bambuserver2)

traefik-net (overlay)

Traefik redirecting successfully the 3 services in inital stack (please look at urls and traefik tab)

portainer

traefik

gitea

Traefik showing error for the 4th service on bambuserver2

drone error

What is inside the log from Traefik

time="2017-06-04T04:12:55Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 1.712028ms" 
time="2017-06-04T04:12:56Z" level=warning msg="Error forwarding to http://10.0.0.9:8000, err: dial tcp 10.0.0.9:8000: i/o timeout" 
time="2017-06-04T04:12:57Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 2.247036ms" 
firecyberice commented 7 years ago

I tried the setup again on my 5 node rpi 3 model B cluster without any problems but the rpi version of hypriotOS has all network related features available only some storage driver are missing. I think the odroid version needs to enable CONFIG_IPVLAN which is missing the listing of @gdeverlant

al-sabr commented 7 years ago

As a reminder of what I did before.

sudo apt-get install -y bc curl gcc git libncurses5-dev lzop make
git clone --depth 1 --single-branch -b odroidc-3.10.y https://github.com/hardkernel/linux
cd linux
make odroidc_defconfig
sed -ie 's/# CONFIG_VXLAN is not set/CONFIG_VXLAN=m/g' .config
make -j 4 uImage dtbs modules
sudo cp arch/arm/boot/uImage arch/arm/boot/dts/*.dtb /boot
sudo make modules_install
sudo make firmware_install
sudo make headers_install INSTALL_HDR_PATH=/usr
kver=`make kernelrelease`
sudo cp .config /boot/config-${kver}
cd /boot
sudo update-initramfs -c -k ${kver}
sudo mkimage -A arm -O linux -T ramdisk -a 0x0 -e 0x0 -n initrd.img-${kver} -d initrd.img-${kver} uInitrd-${kver}
sudo cp uInitrd-${kver} /boot/uInitrd
al-sabr commented 7 years ago

I don't know if this is relevant to this problem maybe you guys can have a look

https://github.com/moby/moby/issues/27897

al-sabr commented 7 years ago

Ahhh crap it didn't change anything :(

This is the furthest i was able to go !

https://github.com/mlinuxguy/odroid-c1-kernel-3.19

al-sabr commented 7 years ago

This Docker feature requires Kernel Vendor libnetwork v0.7.0-dev.7 : Experimental MacVlan and IPVlan network drivers https://github.com/moby/moby/pull/21122

Warning I'm not even sure that I understand correctly my last claim .... or if it is connected with this bug : D

al-sabr commented 7 years ago

I found this github repo from @umiddelb https://github.com/umiddelb/armhf/wiki/How-To-compile-a-custom-Linux-kernel-for-your-ARM-device#odroid-c1-mainline-experimental

al-sabr commented 7 years ago

Tried the following steps for ODROID-C1 mainline (experimental!)

$ curl -sSL https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.8.tar.xz | unxz | tar -xvf -
$ cd linux
$ make multi_v7_defconfig

and decided to compare what is in the config and what is missing and found at the following results: compared with my odroid c1 ./check-config.sh

======================================= missing but should be there

CONFIG_NET_NS
CONFIG_PID_NS
CONFIG_IPC_NS
CONFIG_UTS_NS
CONFIG_BRIDGE_NETFILTER
CONFIG_NF_NAT_IPV4
CONFIG_IP_NF_FILTER
CONFIG_IP_NF_TARGET_MASQUERADE
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE
CONFIG_NETFILTER_XT_MATCH_CONNTRACK
CONFIG_NETFILTER_XT_MATCH_IPVS
CONFIG_IP_NF_NAT
CONFIG_NF_NAT
CONFIG_NF_NAT_NEEDED
CONFIG_DEVPTS_MULTIPLE_INSTANCES

CONFIG_USER_NS
CONFIG_MEMCG_SWAP
CONFIG_MEMCG_SWAP_ENABLED
CONFIG_RESOURCE_COUNTERS

CONFIG_NET_CLS_CGROUP
CONFIG_CFS_BANDWIDTH
CONFIG_FAIR_GROUP_SCHED
CONFIG_RT_GROUP_SCHED

CONFIG_IP_VS
CONFIG_IP_VS_NFCT
CONFIG_IP_VS_RR
- "ftp,tftp client in container":
CONFIG_NF_NAT_FTP
CONFIG_NF_CONNTRACK_FTP
CONFIG_NF_NAT_TFTP
CONFIG_NF_CONNTRACK_TFTP

CONFIG_AUFS_FS
CONFIG_BTRFS_FS
CONFIG_BTRFS_FS_POSIX_ACL

CONFIG_BLK_DEV_DM
CONFIG_DM_THIN_PROVISIONING:

======================================= added manually

CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_SCHED=y
CONFIG_CPUSETS=y
CONFIG_MEMCG=y

CONFIG_VETH=y
CONFIG_BRIDGE=y

CONFIG_POSIX_MQUEUE=y

CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_PERF=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y

CONFIG_IPVLAN=m
CONFIG_VXLAN=m

CONFIG_XFRM_USER=y
CONFIG_INET_ESP=y

======================================= added but not necessary

CONFIG_EXT4_ENCRYPTION=y
CONFIG_EXT4_DEBUG=y
CONFIG_DUMMY=m
docbobo commented 7 years ago

IMHO, you need to add drone-net to traefik, otherwise it can't route traffic there...

al-sabr commented 7 years ago

Sorry I've updated the description to the actual latest correct version my bad. They are in the same overlay network traefik-net as you can see. There is no drone-net :D

This is what is running right now in my cluster.

docbobo commented 7 years ago

I feel this must be related to your config, since I am running traefik, consul, and node-red in a similar fashion on my C2s. Unless it's related to the rather old kernel for the C1...

Do you think you could simplify your setup to

?

al-sabr commented 7 years ago

I've tried the whoami setup with 10 replicas on the 10 odroid c2 nodes and traefik was seeing the service but docker could not give traefik access to the overlay network. Ok let me simplify it as you said.

Do you also have a cluster or are you running everything on 1 node?

docbobo commented 7 years ago

Can you also post the output of docker network inspect traefik-net?

al-sabr commented 7 years ago

The manager node is an Odroid C1

image

docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]
al-sabr commented 7 years ago

yaml whoami

The whoami service is running on the Odroic C2 nodes.

version: "3"

networks:
  traefik-net:
    external: true     

services:

  whoami: 
    image: admiralobvious/whoami-aarch64
    networks:
      - traefik-net
    deploy:
        replicas: 5
        labels:
            traefik.port: "80"
            traefik.enable: "true"
            traefik.backend.loadbalancer.sticky: "true"
            traefik.backend.loadbalancer.method: "wrr"
            #traefik.backend.loadbalancer.swarm: "true"
            traefik.frontend.passHostHeader: "true"
            traefik.docker.network: "traefik-net"
            traefik.frontend.rule: "Host:whoami.cluster.publicvm.com"
        placement:
            constraints:
                - node.labels.arch==arm64          
$ docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]

image image

al-sabr commented 7 years ago

yaml Portainer

version: "3"

networks:
  traefik-net:
    external: true 

services:  
  portainer:
    depends_on: [ traefik ]
    image: portainer/portainer:linux-arm-1.13.2
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - portainer-datas:/data
    networks:
      - traefik-net
    deploy:
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure
      labels:
        traefik.docker.network: "traefik-net"
        traefik.port: "9000"
        traefik.frontend.rule: "Host:portainer.cluster.publicvm.com"      

volumes:
  portainer-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/portainer
$ docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "0de2015679c7376c41a99cd57bbf380ec8b4be41593bc8642c39bc2ee9f86f3a": {
                "Name": "portainer_portainer.1.pwla0ki3quxaq3kz2guvjt2jq",
                "EndpointID": "9ad492667b6f5e92c8424bcd9ea5910e99b7d8b06e7b85226b2cdcdb0e66ea89",
                "MacAddress": "02:42:0a:00:00:0b",
                "IPv4Address": "10.0.0.11/24",
                "IPv6Address": ""
            },
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]

image

image

al-sabr commented 7 years ago

As you can see everything outside the manager node is not accessible by Traefik

time="2017-06-05T09:18:00Z" level=info msg="Skipping same configuration for provider docker" 
time="2017-06-05T09:18:02Z" level=warning msg="Error forwarding to http://10.0.0.9:80, err: dial tcp 10.0.0.9:80: getsockopt: no route to host" 
time="2017-06-05T09:18:02Z" level=debug msg="Round trip: http://10.0.0.11:9000, code: 200, duration: 821.781684ms"
docbobo commented 7 years ago

Weird. I'll have another look in the evening.

al-sabr commented 7 years ago

So in your opinion do you think that my analysis is correct with this case being a bug ?

al-sabr commented 7 years ago

I've filed a ticket in moby and someone asked me to do some basic tests without Traefik in the equation.

You can read my process here https://github.com/moby/moby/issues/33531

al-sabr commented 7 years ago

I've just tested with only 2 Odroid C2 devices

  1. Manager Node with Traefik + Portainer
  2. Second node with drone 0.7.1

It works without any problem. It seems that the bug is in the Odroid C1 build with Docker.

al-sabr commented 7 years ago

Closing this issue reason to old kernel missing features.