hypriot / image-builder-odroid-c1

Build SD card image for ODROID C1 and C1+
http://blog.hypriot.com/post/how-to-get-docker-working-on-your-favourite-arm-board-with-hypriotos/
MIT License
15 stars 6 forks source link

Traefick and Docker Swarm generating OVERLAY network error #38

Closed al-sabr closed 7 years ago

al-sabr commented 7 years ago

Related issue with Traefik team: https://github.com/containous/traefik/issues/1423

Hi guys I'm stucked again with Traefick and Docker Swarm.

I tried to use this tutorial : https://docs.traefik.io/user-guide/swarm-mode/

On my 2 Odroid C1 boards I have :

The part where it doesn't work is at this step

docker service create \ --name traefik \ --constraint=node.role==manager \ --publish 80:80 --publish 8080:8080 \ --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \ --network traefik-net \ hypriot/rpi-traefik \ --docker \ --docker.swarmmode \ --docker.domain=traefik \ --docker.watch \ --web

Screenshots of Portainer

portainer_service_traefik portainer_containers_view portainer_networks_view

In Portainer I can see the error message for this container:

error

The kernel I use is:

Linux server1 3.10.104-186 #1 SMP PREEMPT Mon Mar 20 11:48:07 UTC 2017 armv7l GNU/Linux

Containers: 16 Running: 7 Paused: 0 Stopped: 9 Images: 236 Server Version: 17.04.0-ce Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 327 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: active NodeID: uuohul0pha427ijm0323vzhyw Is Manager: true ClusterID: ae3sayj5fn5y55mem4xkcppq2 Managers: 1 Nodes: 2 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 3 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Node Address: 192.168.1.3 Manager Addresses: 192.168.1.3:2377 Runtimes: runc Default Runtime: runc Init Binary: containerd version: 422e31ce907fd9c3833a38d7b8fdd023e5a76e73 runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228 init version: 949e6fa Security Options: apparmor Kernel Version: 3.10.104-186 Operating System: Debian GNU/Linux 8 (jessie) OSType: linux Architecture: armv7l CPUs: 4 Total Memory: 940.9MiB Name: server1 ID: 7GHE:CHRG:TDC4:UOTO:3JWM:2ZYU:CHBN:AMIE:W45Y:I5G7:AMSK:ETMY Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No kernel memory limit support

al-sabr commented 7 years ago

After searching a while I've stumbled upon this thread

https://github.com/docker/docker/issues/14145

I don't know if this is the reason why it is not working.

I've also installed lxc and cgroup-lite with latest version.

al-sabr commented 7 years ago

I also found this

https://github.com/docker/libnetwork/issues/329

al-sabr commented 7 years ago

And this

https://github.com/docker/libnetwork/issues/381

al-sabr commented 7 years ago

And this

https://github.com/docker/libnetwork/pull/821

al-sabr commented 7 years ago

When I tried to run Apache Ace Client on the SWARM CLUSTER with the following command there is no problem...

docker service create --name apache-ace-target -e ACE_HOST=apache-ace-server -e ACE_HOST_PORT=9080 --replicas=2 apache-ace-target-arm

portainer_containers_view_apache_ace portainer_apache_ace_running_swarm
al-sabr commented 7 years ago

It seems that when I remove the --publish 80:80 --publish 8080:8080 with the --network traefik-net flags the service runs without a problem except that it's impossible to access Traefik within the browser.

docker service create \ --name traefik \ --constraint=node.role==manager \ --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \ hypriot/rpi-traefik \ --docker \ --docker.swarmmode=true \ --docker.watch=true \ --docker.domain=docker.localhost \ --docker.exposedbydefault=true \ --web

portainer_traefik_running portainer_containers_view_all

The problem seems to be related with EXPOSING PORTS + LINKING NETWORK

docbobo commented 7 years ago

Okay, this is just a guess and also not really encouraging: HypriotOS builds for the Odroid C1 and Odroid C2 are quite similar these days. And I am running Traefik successfully on the C2. Based on that and the links to Docker issues you've provided, I am guessing that this is primarily a kernel issue - the official C1 kernel ist based on 3.10.

If you have time, you could try to build a mainline kernel for the C1. This is described here. If that's successful, perhaps we can include a custom kernel instead of the official one.

al-sabr commented 7 years ago

I never did what you suggest me to do and I never went that low level to deal with custom kernel compiling. I'm afraid that right now I don't have the ground knowledge to do this task I'm sorry :(

But it seems that I have found a website with informations regarding the kernel 4.X on the Odroid C1 Forum with links to various resources web sites.

This page has pre-compiled kernels for C1 but I don't know what features are necessary for HypriotOS could you please guide me on the next steps?

https://kernelci.org/boot/meson8b-odroidc1//

Odroid C1 thread : http://forum.odroid.com/viewtopic.php?f=111&t=19292

Thank you for your help :)

al-sabr commented 7 years ago

@docbobo @StefanScherer to be honest I have found this mainline kernel compiled but there are so many variant that I really don't understand the meaning of them.

This one seems to be a mainline v4.11-rc5-133

https://kernelci.org/boot/meson8b-odroidc1/job/mainline/branch/master/kernel/v4.11-rc5-133-gea6b1720ce25/defconfig/multi_v7_defconfig+CONFIG_THUMB2_KERNEL=y+CONFIG_ARM_MODULE_PLTS=y/

This is the log file : Boot log: meson8b-odroidc1

Now that I have the zImage what should I do ?

zImage download

al-sabr commented 7 years ago

I have a question concerning maybe a link between the OVERLAY storage driver and the AUFS driver...

My docker info says

$ docker info Containers: 9 Running: 7 Paused: 0 Stopped: 2 Images: 232 Server Version: 17.04.0-ce Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 312 Dirperm1 Supported: true

Does this docker command docker network create --driver=overlay traefik-net has anything to do with the OVERLAY Storage Driver?

My question is actually does the OVERLAY network have a dependancy from OVERLAY Storage Driver ??????

Maybe that's the reason why docker is not able to connect the Traefik service in SWARM MODE to the OVERLAY network.

al-sabr commented 7 years ago

Browsing on Github I have found this Odroid C1 Gentoo Overlay repo but I'm not sure if it has to do with the C1 Overlay Storage Engine.

Can you check if I'm correct ?

https://github.com/nemunaire/odroidc1-overlay

al-sabr commented 7 years ago

@docbobo @StefanScherer is it enough that I go on this link and download the 2 files

https://kernelci.org/boot/id/58e609ea59b514863ab12d51/

  1. dtbs/meson8b-odroidc1.dtb (3 KB)
  2. zImage (6.61 MiB)

and upload them in via ssh in the root folder of the microSD card and change the boot.ini to load them instead?

setenv bootargs "console=ttyS0,115200n8 console=tty0 root=/dev/mmcblk0p1 rootwait rw no_console_suspend vdaccfg=0xa000 logo=osd1,loaded,0x7900000,720p,full dmfc=3 cvbsmode=576cvbs hdmimode=${m} m_bpp=${m_bpp} vout=${vout_mode} ${disableuhs} ${hdmi_hpd} ${hdmi_cec}" ext4load mmc 0:1 0x21000000 /boot/zImage ext4load mmc 0:1 0x21800000 /boot/meson8b_odroidc1.dtb fdt addr 21800000

Or is it more complicated than that ?????

How do I integrate the last part to this tutorial on building the kernel for Odroid C1????

Official Odroid C1 kernel tutorial

firecyberice commented 7 years ago

The overlay network has nothing to do with the overlay storage driver.

al-sabr commented 7 years ago

@docbobo

Okay, this is just a guess and also not really encouraging: HypriotOS builds for the Odroid C1 and Odroid C2 are quite similar these days. And I am running Traefik successfully on the C2. Based on that and the links to Docker issues you've provided, I am guessing that this is primarily a kernel issue - the official C1 kernel ist based on 3.10.

Don't get me wrong Traefik is working really well on my Odroid C1 when I'm not using the Docker Swarm Mode with regular containers created with docker run -d --name xyz etc....

The problems of this thread are only related with Traefik not able to be installed as service with Docker Swarm Mode.

al-sabr commented 7 years ago

Can someone help here ? I did a lot of research and I don't know yet where to go from here ...

Thanx

docbobo commented 7 years ago

@gdeverlant

Don't get me wrong Traefik is working really well on my Odroid C1 when I'm not using the Docker Swarm Mode with regular containers created with docker run -d --name xyz etc....

The problems of this thread are only related with Traefik not able to be installed as service with Docker Swarm Mode.

Yes, Traefik will work well without Docker Swarm Mode. This is because Docker Swarm attempts to create an overlay network for the service - which fails due to the old kernel.

I'll have a brief look at the mainline kernel you dug up, but to me it seems as if the modules are missing. So I am not to confident, this'll work. The Gentoo Overlay Repository provides a few general packages, but not the kernel itself.

al-sabr commented 7 years ago

I have found this ticket on Docker's repo and it seems that they support VXLAN for Kernels lower than 3.16

can you check this out?

https://github.com/docker/libnetwork/pull/821

docbobo commented 7 years ago

Do you mind running the upgrade to docker 17.04.00 (see #39) to verify if the problem still persists?

al-sabr commented 7 years ago

I already thought about that and did so but nothing changes...

docker -v
Docker version 17.04.0-ce, build 4845c56
al-sabr commented 7 years ago

When I went through the Swarm Mode documentation on Docker's website I found this statement :

https://docs.docker.com/engine/swarm/swarm-tutorial/#open-protocols-and-ports-between-the-hosts

Open protocols and ports between the hosts The following ports must be available. On some systems, these ports are open by default.

TCP port 2377 for cluster management communications TCP and UDP port 7946 for communication among nodes UDP port 4789 for overlay network traffic If you are planning on creating an overlay network with encryption (--opt encrypted), you will also need to ensure ip protocol 50 (ESP) traffic is allowed.

So I've searched for how to set my firewall with iptables and I found this link:

https://www.digitalocean.com/community/tutorials/how-to-configure-the-linux-firewall-for-docker-swarm-on-ubuntu-16-04

I've set my firewall on manager and worker as suggested and it didn't worked as well.

docbobo commented 7 years ago

Okay, things are perhaps easier than expected. This is the output of the docker check-config.sh

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: missing
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_NETPRIO_CGROUP: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: missing
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: missing
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: missing
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

As you can see, it says CONFIG_VXLAN missing. So it might be sufficient to just rebuild the 3.10.x kernel, which will be somewhat easier than moving to mainline. Let me give it a try.

al-sabr commented 7 years ago

No wayyyy is that so simple ??????

Can you also activate CONFIG_OVERLAY_FS ??? I hope it is possible :)

I don't want to reflash my cluster so please just tell me which files I need to replace on the microSD cards to make the update.

Thanx

al-sabr commented 7 years ago

Can you make a video tutorial or blog post about how to do this kernel compilation... I'm someone visual and learn better with screencasts or screenshots.

I never compile the linux kernel for Odroid C1 and I don't understand these modules concept and kernel differences.

I would like to be independant in the future when there are such bugs or ticket to contribute to this repo.

al-sabr commented 7 years ago

Do you know when you will release this ?

docbobo commented 7 years ago

First of all, I need to build the kernel and verify that the module is there. Then, I will likely create a pull request for the hardkernel image repository, so that they can update their default configuration.

If you need to get this running anytime soon, you'll probably be better of building your own kernel - if it works as expected. I can try to give you some instructions, shouldn't be too complicated.

docbobo commented 7 years ago

Bad timing, sorry. However, if you follow this step by step, you should be able to build your own kernel with the VXLAN module. Make sure to have a backup though.

$ sudo apt-get install -y bc curl gcc git libncurses5-dev lzop make
$ git clone --depth 1 --single-branch -b odroidc-3.10.y https://github.com/hardkernel/linux
$ cd linux
$ make odroidc_defconfig
$ sed -ie 's/# CONFIG_VXLAN is not set/CONFIG_VXLAN=m/g' .config
$ make -j 4 uImage dtbs modules
$ sudo cp arch/arm/boot/uImage arch/arm/boot/dts/*.dtb /boot
$ sudo make modules_install
$ sudo make firmware_install
$ sudo make headers_install INSTALL_HDR_PATH=/usr
$ kver=`make kernelrelease`
$ sudo cp .config /boot/config-${kver}
$ cd /boot
$ sudo update-initramfs -c -k ${kver}
$ sudo mkimage -A arm -O linux -T ramdisk -a 0x0 -e 0x0 -n initrd.img-${kver} -d initrd.img-${kver} uInitrd-${kver}
$ sudo cp uInitrd-${kver} /boot/uInitrd

This is based on the instructions mentioned above, but adopted for HypriotOS.

docbobo commented 7 years ago

I ran this on the odroid itself. Not sure if it will work in VirtualBox

al-sabr commented 7 years ago

YAY IT IS WORKINGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG !!!!!!!!!!!!!!!!!!!

You are a champioin !!!!!!

Thank you

al-sabr commented 7 years ago

I tried to change my hostname with :

$ sudo device-init hostname set traefik
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/hypriot/device-init/cmd.activeInterfaces(0x0, 0x0, 0x0)
        /opt/gopath/src/github.com/hypriot/device-init/cmd/hostname_set.go:123 +0x31c
github.com/hypriot/device-init/cmd.setHostname(0x108a5dfc, 0x1, 0x1)
        /opt/gopath/src/github.com/hypriot/device-init/cmd/hostname_set.go:86 +0x50c
github.com/hypriot/device-init/cmd.glob.func5(0x670c30, 0x108206d0, 0x1, 0x1)
        /opt/gopath/src/github.com/hypriot/device-init/cmd/hostname_set.go:39 +0xa4
github.com/spf13/cobra.(*Command).execute(0x670c30, 0x10820688, 0x1, 0x1, 0x0, 0x0)
        /opt/gopath/src/github.com/hypriot/device-init/Godeps/_workspace/src/github.com/spf13/cobra/command.go:569 +0x664
github.com/spf13/cobra.(*Command).ExecuteC(0x670d28, 0x670c30, 0x0, 0x0)
        /opt/gopath/src/github.com/hypriot/device-init/Godeps/_workspace/src/github.com/spf13/cobra/command.go:656 +0x440
github.com/spf13/cobra.(*Command).Execute(0x670d28, 0x0, 0x0)
        /opt/gopath/src/github.com/hypriot/device-init/Godeps/_workspace/src/github.com/spf13/cobra/command.go:615 +0x28
github.com/hypriot/device-init/cmd.Execute()
        /opt/gopath/src/github.com/hypriot/device-init/cmd/root.go:53 +0x20
main.main()
        /opt/gopath/src/github.com/hypriot/device-init/main.go:28 +0x14