nmstate / kubernetes-nmstate

Declarative node network configuration driven through Kubernetes API.
GNU General Public License v2.0
176 stars 87 forks source link

Probe fails to find default gw with ubuntu `NETLINK_GET_STRICT_CHK`, loops for a while before continuing #1174

Closed k8scoder192 closed 1 year ago

k8scoder192 commented 1 year ago

What happened: Hi Enrique

I find that the merge of https://github.com/nmstate/kubernetes-nmstate/pull/1153 into v0.77 broke ping probe. PR 1153 = "Probes: choose the default gw from the main routing table"

ip route

default via xxx.yyy.aaa.bb dev mynet0 proto static metric 426
.
.
.

cat /etc/iproute2/rt_tables

#
# reserved values
#
255     local
254     main
253     default
0       unspec
#
# local
#
#1      inr.ruhep

Logs of nmstate-handler

{"level":"info","ts":"2023-04-12T17:23:08.335Z","logger":"probe","msg":"default gw missing","path":"routes.running.next-hop-address","table-id":254}
{"level":"error","ts":"2023-04-12T17:23:08.335Z","logger":"probe","msg":"failed to retrieve default gw","error":"default gw missing","errorVerbose":"default gw missing\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.defaultGw\n\t/workdir/pkg/probe/probes.go:160\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.runPing\n\t/workdir/pkg/probe/probes.go:177\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.pingCondition.func1\n\t/workdir/pkg/probe/probes.go:167\nk8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:220\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:526\nk8s.io/apimachinery/pkg/util/wait.PollImmediate\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:512\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.Select\n\t/workdir/pkg/probe/probes.go:261\ngithub.com/nmstate/kubernetes-nmstate/pkg/client.ApplyDesiredState\n\t/workdir/pkg/client/client.go:167\ngithub.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeNetworkConfigurationPolicyReconciler).Reconcile\n\t/workdir/controllers/handler/nodenetworkconfigurationpolicy_controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1571","stacktrace":"github.com/nmstate/kubernetes-nmstate/pkg/probe.runPing\n\t/workdir/pkg/probe/probes.go:179\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.pingCondition.func1\n\t/workdir/pkg/probe/probes.go:167\nk8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:220\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateWithContext\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:526\nk8s.io/apimachinery/pkg/util/wait.PollImmediate\n\t/workdir/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:512\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.Select\n\t/workdir/pkg/probe/probes.go:261\ngithub.com/nmstate/kubernetes-nmstate/pkg/client.ApplyDesiredState\n\t/workdir/pkg/client/client.go:167\ngithub.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeNetworkConfigurationPolicyReconciler).Reconcile\n\t/workdir/controllers/handler/nodenetworkconfigurationpolicy_controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/workdir/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}
{"level":"info","ts":"2023-04-12T17:23:09.321Z","logger":"probe","msg":"default gw missing","path":"routes.running.next-hop-address","table-id":254}

What you expected to happen: Probe not to fail gw lookup

How to reproduce it (as minimally and precisely as possible): Use v0.77 and try to create a interface (like a vlan, or whatever). Tail the "nmstate-handler" logs.

Anything else we need to know?:

Environment:

qinqon commented 1 year ago

Looks like there is no routes

 routes:
      config: []
      running: []
qinqon commented 1 year ago

Can you do kubectl exec -n nmstate [nmstate-handler pod] -- nmstatectl show and dump the result here ? Somehow nmstatectl is not dumping routes

k8scoder192 commented 1 year ago

@qinqon

kubectl exec -n nmstate nmstate-handler-4txyz -- nmstatectl show

[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Boot route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Static route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Ra route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Dhcp route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Mrouted route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve KeepAlived route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Babel route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:16Z INFO  nmstate::nm::show] Got unsupported interface type generic: lo, ignoring
[2023-04-13T12:38:16Z WARN  nmstate::nm::show] Failed to find applied NmConnection for interface ens8 802-3-ethernet
[2023-04-13T12:38:16Z WARN  nmstate::nm::show] Failed to find applied NmConnection for interface ens3 802-3-ethernet
hostname:
  running: worker2
  config: worker2
dns-resolver:
  running:
    server:
    - 8.8.8.8
    search:
    - <redacted>
  config:
    server:
    - 8.8.8.8
    search:
    - <redacted>
route-rules:
  config:
  - family: ipv6
    route-table: 255
  - family: ipv6
    priority: 1000
  - family: ipv6
    priority: 32766
    route-table: 254
  - family: ipv4
    priority: 9
    route-table: 2004
    fwmark: '0x200'
    fwmask: '0xf00'
  - family: ipv4
    priority: 10
    route-table: 2005
    fwmark: '0xa00'
    fwmask: '0xf00'
  - family: ipv4
    priority: 100
    route-table: 255
  - family: ipv4
    priority: 1000
  - family: ipv4
    priority: 32766
    route-table: 254
  - family: ipv4
    priority: 32767
    route-table: 253
routes:
  running: []
  config: []
interfaces:
- name: backend-br9
  type: linux-bridge
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false
    dhcp: false
    autoconf: false
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  bridge:
    options:
      gc-timer: 23256
      group-addr: 01:80:C2:00:00:00
      group-forward-mask: 0
      group-fwd-mask: 0
      hash-max: 4096
      hello-timer: 0
      mac-ageing-time: 300
      multicast-last-member-count: 2
      multicast-last-member-interval: 100
      multicast-membership-interval: 26000
      multicast-querier: false
      multicast-querier-interval: 25500
      multicast-query-interval: 12500
      multicast-query-response-interval: 1000
      multicast-query-use-ifaddr: false
      multicast-router: auto
      multicast-snooping: true
      multicast-startup-query-count: 2
      multicast-startup-query-interval: 3124
      stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
      vlan-protocol: 802.1q
    port:
    - name: backend-v9
      stp-hairpin-mode: false
      stp-path-cost: 100
      stp-priority: 32
      vlan:
        enable-native: false
        mode: trunk
        trunk-tags:
        - id-range:
            min: 2
            max: 4094
- name: backend-v9
  type: vlan
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: backend-br9
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  vlan:
    base-iface: bond1
    id: 9
    protocol: 802.1q
- name: bond0
  type: bond
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: true
    dhcp: false
    autoconf: false
    address:
    - ip: <redacted>
      prefix-length: 64
    addr-gen-mode: stable-privacy
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  link-aggregation:
    mode: 802.3ad
    options:
      ad_actor_sys_prio: 65535
      ad_actor_system: 00:00:00:00:00:00
      ad_select: stable
      ad_user_port_key: 0
      all_slaves_active: dropped
      arp_all_targets: any
      arp_interval: 0
      arp_validate: none
      downdelay: 0
      lacp_rate: slow
      lp_interval: 1
      miimon: 100
      min_links: 0
      primary_reselect: always
      updelay: 0
      use_carrier: true
      xmit_hash_policy: layer2
    port:
    - veth0
- name: bond1
  type: bond
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: true
    dhcp: false
    autoconf: false
    address:
    - ip: <redacted>
      prefix-length: 64
    addr-gen-mode: stable-privacy
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  link-aggregation:
    mode: 802.3ad
    options:
      ad_actor_sys_prio: 65535
      ad_actor_system: 00:00:00:00:00:00
      ad_select: stable
      ad_user_port_key: 0
      all_slaves_active: dropped
      arp_all_targets: any
      arp_interval: 0
      arp_validate: none
      downdelay: 0
      lacp_rate: slow
      lp_interval: 1
      miimon: 100
      min_links: 0
      primary_reselect: always
      updelay: 0
      use_carrier: true
      xmit_hash_policy: layer2
    port:
    - veth2
- name: cilium_host
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 32
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  veth:
    peer: cilium_net
- name: cilium_net
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
  veth:
    peer: cilium_host
- name: cilium_vxlan
  type: vxlan
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
  vxlan:
    id: 0
    learning: false
    destination-port: 8472
- name: docker0
  type: linux-bridge
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 16
  ipv6:
    enabled: false
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  bridge:
    options:
      gc-timer: 5234
      group-addr: 01:80:C2:00:00:00
      group-forward-mask: 0
      group-fwd-mask: 0
      hash-max: 512
      hello-timer: 0
      mac-ageing-time: 300
      multicast-last-member-count: 2
      multicast-last-member-interval: 100
      multicast-membership-interval: 26000
      multicast-querier: false
      multicast-querier-interval: 25500
      multicast-query-interval: 12500
      multicast-query-response-interval: 1000
      multicast-query-use-ifaddr: false
      multicast-router: auto
      multicast-snooping: true
      multicast-startup-query-count: 2
      multicast-startup-query-interval: 3124
      stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
      vlan-protocol: 802.1q
    port: []
- name: ens3
  type: ethernet
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: mynet0
  accept-all-mac-addresses: false
  ethernet: {}
- name: ens8
  type: ethernet
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 16
    - ip: <redacted>
      prefix-length: 16
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  ethernet: {}
- name: ens9
  type: ethernet
  state: down
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  accept-all-mac-addresses: false
  ethernet: {}
- name: lo
  type: loopback
  state: ignore
  mac-address: <redacted>
  mtu: 65536
  ipv4:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 8
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 128
  accept-all-mac-addresses: false
- name: lxc51f5775fd945
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc56c7ef5af2d7
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc5f572043bec2
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc63e8e33860c2
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc7b53b16cebff
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc7d751137a205
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc8e419722ce5f
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc9a97324c63bf
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxc_health
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxca5b9a4074678
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxcd2072f516a96
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxce41c657bbe14
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: lxce8a7a3a30d0a
  type: veth
  state: ignore
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: true
    address:
    - ip: <redacted>
      prefix-length: 64
  accept-all-mac-addresses: false
- name: mynet0
  type: linux-bridge
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: true
    dhcp: false
    address:
    - ip: <redacted>
      prefix-length: 23
  ipv6:
    enabled: true
    dhcp: false
    autoconf: false
    address:
    - ip: <redacted>
      prefix-length: 64
    addr-gen-mode: eui64
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  bridge:
    options:
      gc-timer: 87
      group-addr: 01:80:C2:00:00:00
      group-forward-mask: 0
      group-fwd-mask: 0
      hash-max: 4096
      hello-timer: 0
      mac-ageing-time: 300
      multicast-last-member-count: 2
      multicast-last-member-interval: 100
      multicast-membership-interval: 26000
      multicast-querier: false
      multicast-querier-interval: 25500
      multicast-query-interval: 12500
      multicast-query-response-interval: 1000
      multicast-query-use-ifaddr: false
      multicast-router: auto
      multicast-snooping: true
      multicast-startup-query-count: 2
      multicast-startup-query-interval: 3124
      stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
      vlan-protocol: 802.1q
    port:
    - name: ens3
      stp-hairpin-mode: false
      stp-path-cost: 100
      stp-priority: 32
- name: usb-int-br200
  type: linux-bridge
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false
    dhcp: false
    autoconf: false
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  bridge:
    options:
      gc-timer: 21618
      group-addr: 01:80:C2:00:00:00
      group-forward-mask: 0
      group-fwd-mask: 0
      hash-max: 4096
      hello-timer: 0
      mac-ageing-time: 300
      multicast-last-member-count: 2
      multicast-last-member-interval: 100
      multicast-membership-interval: 26000
      multicast-querier: false
      multicast-querier-interval: 25500
      multicast-query-interval: 12500
      multicast-query-response-interval: 1000
      multicast-query-use-ifaddr: false
      multicast-router: auto
      multicast-snooping: true
      multicast-startup-query-count: 2
      multicast-startup-query-interval: 3124
      stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
      vlan-protocol: 802.1q
    port:
    - name: usb-int-v200
      stp-hairpin-mode: false
      stp-path-cost: 100
      stp-priority: 32
      vlan:
        enable-native: false
        mode: trunk
        trunk-tags:
        - id-range:
            min: 2
            max: 4094
- name: usb-int-v200
  type: vlan
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: usb-int-br200
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  vlan:
    base-iface: bond0
    id: 200
    protocol: 802.1q
- name: veth0
  type: veth
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: bond0
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  veth:
    peer: veth1
- name: veth1
  type: veth
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false
    dhcp: false
    autoconf: false
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  veth:
    peer: veth0
- name: veth2
  type: veth
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: bond1
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  veth:
    peer: veth3
- name: veth3
  type: veth
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false
    dhcp: false
    autoconf: false
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  veth:
    peer: veth2
- name: vpc-mgmt-br100
  type: linux-bridge
  state: up
  mac-address: <redacted>
  mtu: 1500
  wait-ip: any
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false
    dhcp: false
    autoconf: false
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  bridge:
    options:
      gc-timer: 8511
      group-addr: 01:80:C2:00:00:00
      group-forward-mask: 0
      group-fwd-mask: 0
      hash-max: 4096
      hello-timer: 0
      mac-ageing-time: 300
      multicast-last-member-count: 2
      multicast-last-member-interval: 100
      multicast-membership-interval: 26000
      multicast-querier: false
      multicast-querier-interval: 25500
      multicast-query-interval: 12500
      multicast-query-response-interval: 1000
      multicast-query-use-ifaddr: false
      multicast-router: auto
      multicast-snooping: true
      multicast-startup-query-count: 2
      multicast-startup-query-interval: 3124
      stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
      vlan-protocol: 802.1q
    port:
    - name: vpc-mgmt-v100
      stp-hairpin-mode: false
      stp-path-cost: 100
      stp-priority: 32
      vlan:
        enable-native: false
        mode: trunk
        trunk-tags:
        - id-range:
            min: 2
            max: 4094
- name: vpc-mgmt-v100
  type: vlan
  state: up
  mac-address: <redacted>
  mtu: 1500
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  controller: vpc-mgmt-br100
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  vlan:
    base-iface: bond0
    id: 100
    protocol: 802.1q
ovs-db: {}

Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.87.254    0.0.0.0         UG    426    0        0 mynet0
10.0.0.0        10.0.2.183      255.255.255.0   UG    0      0        0 cilium_host
10.0.1.0        10.0.2.183      255.255.255.0   UG    0      0        0 cilium_host
10.0.2.0        10.0.2.183      255.255.255.0   UG    0      0        0 cilium_host
10.0.2.183      0.0.0.0         255.255.255.255 UH    0      0        0 cilium_host
10.0.3.0        10.0.2.183      255.255.255.0   UG    0      0        0 cilium_host
11.12.0.0       0.0.0.0         255.255.0.0     U     100    0        0 ens8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Links I'm using

ip link show usb-int-br200
172: usb-int-br200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

ip link show master usb-int-br200
171: usb-int-v200@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master usb-int-br200 state UP mode DEFAULT group default qlen 1000

ip link show master bond0
52: veth0@veth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP mode DEFAULT group default qlen 1000

What I'm doing I created a veth pair (veth0/veth1) and a bond interface (bond0). The veth pair is a dummy just for creating the bond for testing. Then using nmstate to create a vlan on bond0 (vlan interface is usb-int-v200) Then using nmstate to create a bridge attached to the vlan (bridge interface is usb-int-br200) None of these interfaces have an IP or routes since they are L2 only.

When I create the vlan, I see the issue with the handler pod failing the gateway probe. As you know, it loops/tries multiple times before moving on to the next probe. The wait time is very long before an interface finally gets configured due to this and can't go to production with this.

qinqon commented 1 year ago

But the default gw is present before you apply the state ? If it's not the default gw probe is deactivated.

k8scoder192 commented 1 year ago

@qinqon yes the default gw is present prior to creation of vlan or bridge interface

qinqon commented 1 year ago

@qinqon yes the default gw is present prior to creation of vlan or bridge interface

But at the NodeNetworkState that you show me there is no running routes

routes:
  running: []
  config: []

On that case the default gw probe is deactivated since there is no route before apply

Is not a regression of https://github.com/nmstate/kubernetes-nmstate/pull/1153

Somehow nmstatectl show do not show the default route at your system so the default gw probe do some retries to check if it has to run the probe.

We have to discover why routes is empty, I see that you have a bridge on top of ens3 and also there are some weird nmstatectl errors

[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Boot route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Static route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Ra route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Dhcp route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Mrouted route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve KeepAlived route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:15Z WARN  nmstate::nispor::route] Failed to retrieve Babel route via nispor: Failed to set socket option NETLINK_GET_STRICT_CHK: error Protocol not available (os error 92)
[2023-04-13T12:38:16Z INFO  nmstate::nm::show] Got unsupported interface type generic: lo, ignoring
[2023-04-13T12:38:16Z WARN  nmstate::nm::show] Failed to find applied NmConnection for interface ens8 802-3-ethernet
[2023-04-13T12:38:16Z WARN  nmstate::nm::show] Failed to find applied NmConnection for interface ens3 802-3-
qinqon commented 1 year ago

@k8scoder192 can you share the linux kernel version ?

k8scoder192 commented 1 year ago

@qinqon 4.15.0-177-generic

qinqon commented 1 year ago

nmstate team is working on a solution, we will kep you posted.

cathay4t commented 1 year ago

Thanks for the bug report!

Looks like Ubuntu kernel has some problem with this NETLINK_GET_STRICT_CHK, I will patch nispor to fallback to user space route filtering when NETLINK_GET_STRICT_CHK fails. This is also the method used by iproute.

I have created https://github.com/nispor/nispor/issues/226 to trace the effort there.

I don't have time or expertise to investigate why NETLINK_GET_STRICT_CHK fails in Ubuntu kernel, feel free to help in:

In my archlinux, I got:

[fge@Gris-NUC12 lib]$ strace ip route show scope link 2>&1|grep NETLINK_GET_STRICT_CHK
setsockopt(3, SOL_NETLINK, NETLINK_GET_STRICT_CHK, [1], 4) = 0

This =0 means NETLINK_GET_STRICT_CHK works well.

qinqon commented 1 year ago

@k8scoder192, going to change the tittle of the issue since it's not related to the PR, ok ?

k8scoder192 commented 1 year ago

Sure!

Thank you

On Tue, Apr 25, 2023, 12:49 AM Enrique Llorente Pastora < @.***> wrote:

@k8scoder192 https://github.com/k8scoder192, going to change the tittle of the issue since it's not related to the PR, ok ?

— Reply to this email directly, view it on GitHub https://github.com/nmstate/kubernetes-nmstate/issues/1174#issuecomment-1521186372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKYPL673EG2YMTG5677JKWDXC5QYFANCNFSM6AAAAAAW4BAPMA . You are receiving this because you were mentioned.Message ID: @.***>

k8scoder192 commented 1 year ago

@cathay4t is this fixed? It's hampering deployment, it takes a very long time to apply and with a large cluster it's not feasible for production. cc: @qinqon

cathay4t commented 1 year ago

@k8scoder192 It only fixed in upstream of nispor. Let's tag new release of nmstate next week to consume it.

k8scoder192 commented 1 year ago

@qinqon @cathay4t any updates on when a new release will be cut with the upstream of nispor?

qinqon commented 1 year ago

@k8scoder192 I see a release already with the fix https://github.com/nispor/nispor/releases/tag/v1.2.11

Don't know when it will reach nmstate, @cathay4t maybe you know more ?

cathay4t commented 1 year ago

I has already reached nmstate 2.2.11. You may compile it or just download the precompiled binary: https://github.com/nmstate/nmstate/releases/tag/v2.2.11

qinqon commented 1 year ago

@cathay4t 2.2.11 is at centos 9 stream already ?

cathay4t commented 1 year ago

nmstate-2.2.11-1.el9 is in CentOS 9 Stream now.

qinqon commented 1 year ago

Closing since fix has land centos 9 stream

k8scoder192 commented 1 year ago

@cathay4t @qinqon I'm still seeing this issue please see https://github.com/nmstate/kubernetes-nmstate/issues/1196