networkop / cx

Containerised Cumulus VX
19 stars 4 forks source link

3.7.0 SSH fails to start #5

Open networkop opened 2 years ago

networkop commented 2 years ago

reported by @GrigoriyMikhalkin looking at what happens there, it looks like the ifupdown get stuck trying to dhcp eth0.

root@a1271dcfa4b135ae:/etc/network# ip -br add  show dev eth0
eth0             UP             fe80::783f:a7ff:fed7:97df/64
root@a1271dcfa4b135ae:/etc/network# ifreload -a
error: cmd '/usr/bin/vrf task exec mgmt /sbin/dhclient -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0' failed: returned 1
pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0/dhclient -

Cgroup for managing VRF context does not exist.
Has l3mdev cgroup patch been applied to kernel?
If so has it been enabled?

This looks like the VRF support did not exist in upstream kernel and was added via a patch.

I don't have access to 3.7.x system, maybe @GrigoriyMikhalkin you can compare the kernel version and loaded modules? This is how it looks inside ignite:

root@a1271dcfa4b135ae:/etc/network# lsmod
Module                  Size  Used by
vrf                    24576  0
bonding               159744  0
bridge                176128  0
stp                    16384  1 bridge
llc                    16384  2 bridge,stp
kvm_intel             212992  0
kvm                   565248  1 kvm_intel
irqbypass              16384  1 kvm
crc32c_intel           24576  0
aesni_intel           200704  0
aes_x86_64             20480  1 aesni_intel
crypto_simd            16384  1 aesni_intel
virtio_net             45056  0
net_failover           20480  1 virtio_net
failover               16384  1 net_failover
cryptd                 20480  2 crypto_simd,aesni_intel
glue_helper            16384  1 aesni_intel
input_leds             16384  0
led_class              16384  1 input_leds
autofs4                36864  2
root@a1271dcfa4b135ae:/etc/network# uname -r
4.19.0-cl-1-amd64

The reason for this is most likely that 3.7.0 code is running on a kernel built from 4.3.0.

This is the process I haven't really documented anywhere, probably worth adding a guide for it later. The process of building a kernel image. It's fairly simple -- you need to unpack the /boot/vmlinuz into a vmlinux and get a copy of /lib/modules from an existing CL image. Package it up in an container image, vmlinux goes to /boot/ and /lib/modules go to /lib/modules.

The final step would be to update https://github.com/srl-labs/containerlab/blob/master/nodes/cvx/cvx.go to use a different image kernel depending on the version , similar to how it's done with RAM today.

Do you think you can do this @GrigoriyMikhalkin ? I'll be happy to write up some documentation and makefiles for this.

GrigoriyMikhalkin commented 2 years ago

@networkop I guess 3.7.0 kernel is based on 4.1.0. So indeed, lack of VRF support(and, looking at loaded modules, probably bunch of other stuff) must be the reason. Here's list of loaded modules and kernel version:

cumulus@vagrant-leaf01:mgmt-vrf:~$ lsmod
Module                  Size  Used by
nf_conntrack_netlink    36864  0 
nfnetlink              16384  2 nf_conntrack_netlink
xfrm_user              36864  1 
xt_conntrack           16384  1 
ipt_MASQUERADE         16384  2 
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
iptable_nat            16384  1 
nf_conntrack_ipv4      20480  2 
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 24576  2 nf_nat_ipv4,nf_nat_masquerade_ipv4
nf_conntrack           94208  6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
8021q                  28672  0 
garp                   16384  1 8021q
mrp                    20480  1 8021q
vxlan                  45056  0 
ip6_udp_tunnel         16384  1 vxlan
udp_tunnel             16384  1 vxlan
tcp_diag               16384  0 
inet_diag              20480  1 tcp_diag
vrf                    24576  0 
ebt_police             16384  8 
ebt_setclass           16384  7 
xt_POLICE              16384  41 
xt_SETCLASS            16384  19 
ebtable_filter         16384  1 
ebtables               36864  1 ebtable_filter
ip6table_raw           16384  0 
ip6table_mangle        16384  0 
ip6table_filter        16384  1 
ip6_tables             28672  3 ip6table_filter,ip6table_mangle,ip6table_raw
kvm_intel             155648  0 
kvm                   438272  1 kvm_intel
aesni_intel           172032  0 
aes_x86_64             20480  1 aesni_intel
glue_helper            16384  1 aesni_intel
lrw                    16384  1 aesni_intel
gf128mul               16384  1 lrw
ablk_helper            16384  1 aesni_intel
cryptd                 20480  2 aesni_intel,ablk_helper
virtio_rng             16384  0 
rng_core               16384  1 virtio_rng
acpi_cpufreq           20480  0 
softdog                16384  1 
cumulus_vx_platform    20480  0 
hwmon                  16384  1 cumulus_vx_platform
mpls_iptunnel          16384  0 
mpls_router            28672  1 mpls_iptunnel
at24                   16384  0 
eeprom_class           16384  2 at24,cumulus_vx_platform
tun                    28672  0 
br_netfilter           24576  0 
bridge                147456  1 br_netfilter
stp                    16384  2 garp,bridge
llc                    16384  3 stp,garp,bridge
bonding               139264  0 
loop                   28672  0 
autofs4                36864  2 
btrfs                 937984  1 
xor                    24576  1 btrfs
raid6_pq              106496  1 btrfs
dm_mod                102400  0 
crc32c_intel           24576  1 
virtio_net             32768  0 
i2c_piix4              24576  0 
i2c_core               49152  2 at24,i2c_piix4
uhci_hcd               45056  0 
cumulus@vagrant-leaf01:mgmt-vrf:~$ uname -r
4.1.0-cl-7-amd64

Regarding packaging kernel into container image, i will try) Will let you know when i have some results.

GrigoriyMikhalkin commented 2 years ago

Here is containerized kernel grigoriymikh/kernel:4.1.0. Tried to use it with 3.7.0 image:

root@1af163f5aef476b0:mgmt-vrf:~# ip -br add  show dev eth0
eth0             UNKNOWN        fe80::70a7:4bff:fe90:8453/64

IP address is not assigned to eth0 interface.

Kernel ver. and modules:

root@1af163f5aef476b0:mgmt-vrf:~# lsmod
Module                  Size  Used by
vrf                    24576  0 
bonding               139264  0 
bridge                147456  0 
stp                    16384  1 bridge
llc                    16384  2 stp,bridge
kvm_intel             155648  0 
kvm                   438272  1 kvm_intel
crc32c_intel           24576  0 
aesni_intel           172032  0 
virtio_net             32768  0 
aes_x86_64             20480  1 aesni_intel
glue_helper            16384  1 aesni_intel
lrw                    16384  1 aesni_intel
gf128mul               16384  1 lrw
ablk_helper            16384  1 aesni_intel
cryptd                 20480  2 aesni_intel,ablk_helper
autofs4                36864  2 
root@1af163f5aef476b0:mgmt-vrf:~# uname -r
4.1.0-cl-7-amd64

journalctl shows:

Sep 14 11:55:23 584d98949cb12672 systemd[1]: Failed to start Cumulus Linux
 System Monitoring Daemon.
Sep 14 11:55:23 584d98949cb12672 systemd[1]: Dependency failed for Cumulus
 Linux LED Manager Daemon.
networkop commented 2 years ago

@GrigoriyMikhalkin was there any error when you tried to do ifreload -a?

GrigoriyMikhalkin commented 2 years ago

@GrigoriyMikhalkin was there any error when you tried to do ifreload -a?

Nope. Didn't show anything.

GrigoriyMikhalkin commented 2 years ago

Maybe related(because right after that i see Failed to start Cumulus Linux message), smond service is constantly restarting with logs:

Sep 14 12:43:56 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan2(Fan Tray 1, Fan 2): state changed from UNKNOWN to ABSENT
Sep 14 12:43:56 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan2(Fan Tray 1, Fan 2):  Unable to find cpld path: /sys/class/hwmon/hwmon0
                                         Fan2(Fan Tray 1, Fan 2):  Unable to read device
Sep 14 12:44:05 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan3(Fan Tray 2, Fan 1): state changed from UNKNOWN to ABSENT
Sep 14 12:44:05 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan3(Fan Tray 2, Fan 1):  Unable to find cpld path: /sys/class/hwmon/hwmon0
                                         Fan3(Fan Tray 2, Fan 1):  Unable to read device
Sep 14 12:44:14 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan4(Fan Tray 2, Fan 2): state changed from UNKNOWN to ABSENT
Sep 14 12:44:14 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan4(Fan Tray 2, Fan 2):  Unable to find cpld path: /sys/class/hwmon/hwmon0
                                         Fan4(Fan Tray 2, Fan 2):  Unable to read device
Sep 14 12:44:23 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan5(Fan Tray 3, Fan 1): state changed from UNKNOWN to ABSENT
Sep 14 12:44:23 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan5(Fan Tray 3, Fan 1):  Unable to find cpld path: /sys/class/hwmon/hwmon0
                                         Fan5(Fan Tray 3, Fan 1):  Unable to read device
Sep 14 12:44:32 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan6(Fan Tray 3, Fan 2): state changed from UNKNOWN to ABSENT
Sep 14 12:44:32 584d98949cb12672 [1836]: /usr/sbin/smond : : Fan6(Fan Tray 3, Fan 2):  Unable to find cpld path: /sys/class/hwmon/hwmon0
                                         Fan6(Fan Tray 3, Fan 2):  Unable to read device
networkop commented 2 years ago

I think smond and ledmgr are a red herring. These are services controlling physical devices and they normally don't affect the normal control/data plane operations. This is becoming interesting. Can you try removing mgmt vrf from e/n/i completely and do another ifreload to see if it works?

GrigoriyMikhalkin commented 2 years ago

Sorry for silly question, but what

e/n/i

stands for?

Removed mgmt vrf(net del vrf mgmt, hope that was sufficient) and tried ifreload -a one more time, don't see any changes. eth0 still has no assigned address.

networkop commented 2 years ago

sorry, /e/n/i is short for /etc/network/interfaces , a place where all interface configuration goes. I think by default it'll have https://github.com/networkop/cx/blob/main/hacks/interfaces so my suggestion is to change it to this and do ifreload -a:

# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

auto eth0
iface eth0  inet dhcp

source /etc/network/interfaces.d/*
GrigoriyMikhalkin commented 2 years ago

Thanks for clarification) Just checked it, after removing mgmt vrf it looks exactly like that.

networkop commented 2 years ago

ok, do you have the forked containerlab code available somewhere? I want to try it locally

GrigoriyMikhalkin commented 2 years ago

I use latest master branch build with this topo:

name: cl-test

mgmt:
  network: bridge

topology:
  kinds:
    cvx:
      image: networkop/cx:3.7.0
      kernel: docker.io/grigoriymikh/kernel:4.1.0

  nodes:
    leaf1:
      kind: cvx
    leaf2:
      kind: cvx
    frr:
      kind: linux
      image: frrouting/frr:v7.5.1

  links:
    - endpoints: ["leaf1:swp1", "frr:eth1"]
    - endpoints: ["leaf2:swp1", "frr:eth2"]
networkop commented 2 years ago

oh, right. I forgot you could override the default kernel version. cool. will try this later today.

GrigoriyMikhalkin commented 2 years ago

Also noticed that FRR is failing to start:

Sep 15 09:00:21 432cfa107f3b2645 frr[813]: Starting Frr daemons (prio:10):.
Sep 15 09:00:21 432cfa107f3b2645 frr[813]: Exiting: failed to connect to any daemons.
Sep 15 09:00:21 432cfa107f3b2645 frr[813]: Exiting from the script
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Stopping Frr monitor daemon: (watchfrr).
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Stopping Frr daemons (prio:0): (zebra) (bgpd) (ripd) (ripngd) (ospfd) (ospf6d) (isisd) (babeld) (pimd) (ldpd) (nhrpd) (eigrpd) (sharpd) (pbrd) (vrrpd).
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Stopping other frr daemons..
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Removing remaining .vty files.
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Removing all routes made by FRR.
Sep 15 09:00:21 432cfa107f3b2645 frr[840]: Exiting from the script
Sep 15 09:00:21 432cfa107f3b2645 systemd[1]: Started FRRouting.

UPDATE: Nevermind, it was frr config problem.

networkop commented 2 years ago

Looking at it now. First, FRR fails to start -- this is expected. You'd need to mount some non-default FRR config files, otherwise all your daemons (bgpd , zebra) are disabled.

But there's still a problem. It seems like the DHCP replies coming back from the sandbox container are ignored by cumulus but when I set a static IP, I can get connectivity on eth0. so, for example:

root@5557ffdad223d394:mgmt-vrf:~# cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

auto mgmt
iface mgmt
    vrf-table auto

auto eth0
iface eth0  inet static
        address 192.168.223.5/24
        vrf mgmt

source /etc/network/interfaces.d/*
root@5557ffdad223d394:mgmt-vrf:~# 8.223.1 -I eth0
PING 192.168.223.1 (192.168.223.1) from 192.168.223.5 eth0: 56(84) bytes of data.
64 bytes from 192.168.223.1: icmp_seq=1 ttl=64 time=0.540 ms
^C
--- 192.168.223.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms

but

root@5557ffdad223d394:mgmt-vrf:~# ifreload -a
10:28:29.299427 8a:6f:ef:f3:0d:04 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 8a:6f:ef:f3:0d:04, length 300
10:28:29.300767 f6:39:dd:a9:cb:f3 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 334: 0.0.0.0.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 292
10:28:33.422855 8a:6f:ef:f3:0d:04 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 8a:6f:ef:f3:0d:04, length 300
10:28:33.447270 f6:39:dd:a9:cb:f3 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 334: 0.0.0.0.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 292
^C
root@5557ffdad223d394:mgmt-vrf:~# cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

auto mgmt
iface mgmt
    vrf-table auto

auto eth0
iface eth0  inet dhcp
        vrf mgmt

source /etc/network/interfaces.d/*

Looking at the logs I see

Sep 15 10:41:54 5557ffdad223d394 netd[840]: WARNING:  get_lldp: The output of /usr/sbin/lldpctl is not in the expected format.  LLDP output might be incomplete.
Sep 15 10:41:58 5557ffdad223d394 dhclient[3259]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 10
Sep 15 10:42:08 5557ffdad223d394 dhclient[3259]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 11
Sep 15 10:42:19 5557ffdad223d394 dhclient[3259]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 11
Sep 15 10:42:30 5557ffdad223d394 dhclient[3259]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Sep 15 10:42:38 5557ffdad223d394 dhclient[3259]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Sep 15 10:42:38 5557ffdad223d394 dhclient[3259]: 5 bad udp checksums in 5 packets
Sep 15 10:42:46 5557ffdad223d394 dhclient[3259]: No DHCPOFFERS received.

So looks like the UDP checksum set by the ignite sandbox is incorrect. Let me confirm with the pcap.

networkop commented 2 years ago

image

networkop commented 2 years ago

yep, can confirm , running ethtool --ofload vm_eth0 tx off inside the ignite sandbox fixes the problem

GrigoriyMikhalkin commented 2 years ago

@networkop Wow, that was cool) Just was able to make it work locally too.

What would you suggest to do next, regarding CL 3.7 support in containerlab? I suppose we should wait for fix to ignite and then add kernel mapping to containerlab?

networkop commented 2 years ago

yeah, those two things should cover it. If you could run some additional tests on it in the meantime, to make sure no other functionality is affected, that would be great.