nmstate / kubernetes-nmstate

Declarative node network configuration driven through Kubernetes API.
GNU General Public License v2.0
184 stars 90 forks source link

failed to retrieve default gw at runProbes timed out waiting for the condition when Linux Bridge NNCP apply #1063

Open wdrdres3qew5ts21 opened 2 years ago

wdrdres3qew5ts21 commented 2 years ago

What happened: In my DevTest environment when I apply NodeNetworkConfigurationPolicy then NodeNetworkConfigurationEnactment will start to apply but I encounter error failed to retrieve default gw at runProbes timed out waiting for the condition when Linux Bridge NNCP apply I'm apply KubeVirt and NetworkAddOns with NMState manually with myself I don't using any hyperconverged Operator.

Edited1 Also it seem like to cause my Cluster Network problem I cannot reach to my Cluster or Openshift Console anymore after I running command success Edited2 I had suspect that is this my configuration problem or it was issues underlying in Operating System itself ? Is there any compatability matrix between Kubernetes NMstate and NetworkAddOns Manager related component ? Because my NetworkManager already matching requirement (nmcli tool, version 1.30.0-13.el8_4) but somehow it still didn't work. https://bugzilla.redhat.com/show_bug.cgi?id=2037411

image

Here is my NNCP config

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: brse-ens5
spec:
  maxUnavailable: 1
  desiredState:
    interfaces:
    - name: brse
      description: Linux bridge with ens5 as a port
      type: linux-bridge
      state: up
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens5

My Kubernetes (Openshift) Node have interface named ens5 I can reach to another Node by ping at same ens5 IPv4 it response successfully

image

Status of NNCE from command line

#oc get nnce
NAME                                                        STATUS
ip-10-0-135-114.ap-southeast-1.compute.internal.brse-ens5   Failing
ip-10-0-215-42.ap-southeast-1.compute.internal.brse-ens5    Aborted

This is my error from NNCE Detail ip-10-0-135-114.ap-southeast-1.compute.internal.brse-ens5

image

ip-10-0-215-42.ap-southeast-1.compute.internal.brse-ens5

image

What you expected to happen: NNCE should be config successfully

Anything else we need to know?: Is it possible to avoid to using KubeVirt HyperConverged but just apply manifest of KubeVirt, NetworkAddOns and NMstate manually ? Is there any compatible matrix between component ? Currently my Network Manager already passing criteria for NMState already (Version 1.30.0-13.el8_4).

Environment:

quay.io/nmstate/kubernetes-nmstate-handler:v0.64.13

wdrdres3qew5ts21 commented 2 years ago

After I had try to change configure for a while seem like new version need to configure ipv4 section https://nmstate.io/kubernetes-nmstate/examples.html#linux-bridge so I had changed my NNCP to this configure then it solved failed to retrieve default gw at runProbes timed out waiting for the condition however it cause my Openshift Cluster is down I cannot access my Node anymore also I got stuck at progressing forever Normally when we using Multus CNI to create Bridge on Linux Host it will automatically make pod can connect and reach together if it connect to the same Bridge but when I using ClusterNetworkAddOns do I need to adding command like this manually ?

Edited Also it seem like to cause my Cluster Network problem I cannot reach to my Cluster or Openshift Console anymore after I running command success ! Noted I had suspect that is this my configuration problem or it was issues underlying in Operating System itself ? Is there any compatability matrix between Kubernetes NMstate and NetworkAddOns Manager related component ? Because my NetworkManager already matching requirement (nmcli tool, version 1.30.0-13.el8_4) but somehow it still didn't work. https://bugzilla.redhat.com/show_bug.cgi?id=2037411

image image

Here is my CR

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: brse-ens5
spec:
  maxUnavailable: 1
  desiredState:
    interfaces:
    - name: brse
      description: Linux bridge with ens5 as a port
      type: linux-bridge
      state: up
      ipv4:
        dhcp: true
        enabled: true
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens5

I got stuck at problem like this forever and Network Cluster is down I cannot reach to my Openshift Console anymore

#oc get nnce
NAME                                                        STATUS
ip-10-0-135-114.ap-southeast-1.compute.internal.brse-ens5   Progressing
ip-10-0-215-42.ap-southeast-1.compute.internal.brse-ens5    Available

Some snipper for NNCE

apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationEnactment
metadata:
  creationTimestamp: "2022-04-24T16:14:35Z"
  generation: 1
  labels:
    nmstate.io/node: ip-10-0-135-114.ap-southeast-1.compute.internal
    nmstate.io/policy: brse-ens5
  name: ip-10-0-135-114.ap-southeast-1.compute.internal.brse-ens5
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: ip-10-0-135-114.ap-southeast-1.compute.internal
    uid: 8cf6351a-8ab1-47ec-8c27-5d897f2c4336
  resourceVersion: "263283"
  uid: a59f8e6e-d559-4f58-93d1-69f679d64ae6
status:
  conditions:
  - lastHearbeatTime: "2022-04-24T17:55:12Z"
    lastTransitionTime: "2022-04-24T17:55:12Z"
    message: Applying desired state
    reason: ConfigurationProgressing
    status: "True"
    type: Progressing
  - lastHearbeatTime: "2022-04-24T17:55:12Z"
    lastTransitionTime: "2022-04-24T17:55:12Z"
    reason: ConfigurationProgressing
    status: Unknown
    type: Failing
  - lastHearbeatTime: "2022-04-24T17:55:12Z"
    lastTransitionTime: "2022-04-24T17:55:12Z"
    reason: ConfigurationProgressing
    status: Unknown
    type: Available
  - lastHearbeatTime: "2022-04-24T17:55:12Z"
    lastTransitionTime: "2022-04-24T17:55:12Z"
    reason: ConfigurationProgressing
    status: "False"
    type: Pending
  - lastHearbeatTime: "2022-04-24T17:55:12Z"
    lastTransitionTime: "2022-04-24T17:55:12Z"
    reason: ConfigurationProgressing
    status: "False"
    type: Aborted
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens5
          vlan:
            mode: trunk
            trunk-tags:
            - id-range:
                max: 4094
                min: 2
      description: Linux bridge with ens5 as a port
      ipv4:
        dhcp: true
        enabled: true
      name: brse
      state: up
      type: linux-bridge
  desiredStateMetaInfo:
    time: "2022-04-24T17:55:12Z"
    version: "0"
  policyGeneration: 6

And whole of NodeNetworkState ip-10-0-135-114.ap-southeast-1.compute.internal stuck at Progressing forever !

apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
metadata:
  creationTimestamp: "2022-04-24T16:11:49Z"
  generation: 1
  name: ip-10-0-135-114.ap-southeast-1.compute.internal
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: ip-10-0-135-114.ap-southeast-1.compute.internal
    uid: 8cf6351a-8ab1-47ec-8c27-5d897f2c4336
  resourceVersion: "222380"
  uid: 416b115d-8633-40a1-80f8-9442de5ebecb
status:
  currentState:
    dns-resolver:
      config:
        search: null
        server: null
      running:
        search:
        - ap-southeast-1.compute.internal
        server:
        - 10.0.0.2
    interfaces:
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          highdma: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-gre-csum-segmentation: true
          tx-gre-segmentation: true
          tx-ipxip4-segmentation: true
          tx-ipxip6-segmentation: true
          tx-nocache-copy: false
          tx-scatter-gather-fraglist: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
          tx-udp_tnl-csum-segmentation: true
          tx-udp_tnl-segmentation: true
          tx-vlan-hw-insert: true
          tx-vlan-stag-hw-insert: true
      ipv4:
        address: []
        enabled: false
      ipv6:
        address: []
        enabled: false
      mac-address: 1E:79:45:2C:32:44
      mtu: 8951
      name: br0
      state: down
      type: ovs-interface
    - accept-all-mac-addresses: false
      ethtool:
        coalesce:
          adaptive-rx: false
          rx-usecs: 0
          tx-usecs: 64
        feature:
          highdma: true
          rx-checksum: true
          rx-gro: true
          rx-gro-list: false
          rx-hashing: true
          tx-checksum-ipv4: true
          tx-generic-segmentation: true
          tx-nocache-copy: false
        ring:
          rx: 1024
          tx: 1024
      ipv4:
        address:
        - ip: 10.0.135.114
          prefix-length: 17
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        dhcp: true
        enabled: true
      ipv6:
        address:
        - ip: fe80::f29c:426a:51f7:fdbe
          prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
      lldp:
        enabled: false
      mac-address: 0A:FC:40:AF:29:62
      mtu: 9001
      name: ens5
      state: up
      type: ethernet
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          rx-gro: true
          rx-gro-list: false
          tx-generic-segmentation: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
      ipv4:
        address:
        - ip: 127.0.0.1
          prefix-length: 8
        enabled: true
      ipv6:
        address:
        - ip: ::1
          prefix-length: 128
        enabled: true
      mac-address: "00:00:00:00:00:00"
      mtu: 65536
      name: lo
      state: up
      type: unknown
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          highdma: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-gre-csum-segmentation: true
          tx-gre-segmentation: true
          tx-ipxip4-segmentation: true
          tx-ipxip6-segmentation: true
          tx-nocache-copy: false
          tx-scatter-gather-fraglist: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
          tx-udp_tnl-csum-segmentation: true
          tx-udp_tnl-segmentation: true
          tx-vlan-hw-insert: true
          tx-vlan-stag-hw-insert: true
      ipv4:
        address:
        - ip: 10.129.0.1
          prefix-length: 23
        enabled: true
      ipv6:
        address:
        - ip: fe80::c44a:2aff:fe80:199e
          prefix-length: 64
        enabled: true
      mac-address: C6:4A:2A:80:19:9E
      mtu: 8951
      name: tun0
      state: up
      type: ovs-interface
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          rx-checksum: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-nocache-copy: false
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
      ipv4:
        address: []
        enabled: false
      ipv6:
        address:
        - ip: fe80::9069:53ff:fe96:84d9
          prefix-length: 64
        enabled: true
      lldp:
        enabled: false
      mac-address: 92:69:53:96:84:D9
      mtu: 65000
      name: vxlan_sys_4789
      state: down
      type: vxlan
      vxlan:
        base-iface: ""
        destination-port: 4789
        id: 0
        remote: ""
    routes:
      config:
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      running:
      - destination: fe80::/64
        metric: 100
        next-hop-address: '::'
        next-hop-interface: ens5
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: vxlan_sys_4789
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: tun0
        table-id: 254
      - destination: 0.0.0.0/0
        metric: 100
        next-hop-address: 10.0.128.1
        next-hop-interface: ens5
        table-id: 254
      - destination: 10.0.128.0/17
        metric: 100
        next-hop-address: 0.0.0.0
        next-hop-interface: ens5
        table-id: 254
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
  handlerNetworkManagerVersion: 1.39.0-1.el8
  handlerNmstateVersion: 1.2.1
  hostNetworkManagerVersion: 1.30.0
  lastSuccessfulUpdateTime: "2022-04-24T17:20:49Z"

ip-10-0-135-114.ap-southeast-1.compute.internal Waiting first node configure success but but currently it just stuck at Available forever too due to first Node is still configure like dead-lock

apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
metadata:
  creationTimestamp: "2022-04-24T16:11:49Z"
  generation: 1
  name: ip-10-0-135-114.ap-southeast-1.compute.internal
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: ip-10-0-135-114.ap-southeast-1.compute.internal
    uid: 8cf6351a-8ab1-47ec-8c27-5d897f2c4336
  resourceVersion: "222380"
  uid: 416b115d-8633-40a1-80f8-9442de5ebecb
status:
  currentState:
    dns-resolver:
      config:
        search: null
        server: null
      running:
        search:
        - ap-southeast-1.compute.internal
        server:
        - 10.0.0.2
    interfaces:
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          highdma: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-gre-csum-segmentation: true
          tx-gre-segmentation: true
          tx-ipxip4-segmentation: true
          tx-ipxip6-segmentation: true
          tx-nocache-copy: false
          tx-scatter-gather-fraglist: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
          tx-udp_tnl-csum-segmentation: true
          tx-udp_tnl-segmentation: true
          tx-vlan-hw-insert: true
          tx-vlan-stag-hw-insert: true
      ipv4:
        address: []
        enabled: false
      ipv6:
        address: []
        enabled: false
      mac-address: 1E:79:45:2C:32:44
      mtu: 8951
      name: br0
      state: down
      type: ovs-interface
    - accept-all-mac-addresses: false
      ethtool:
        coalesce:
          adaptive-rx: false
          rx-usecs: 0
          tx-usecs: 64
        feature:
          highdma: true
          rx-checksum: true
          rx-gro: true
          rx-gro-list: false
          rx-hashing: true
          tx-checksum-ipv4: true
          tx-generic-segmentation: true
          tx-nocache-copy: false
        ring:
          rx: 1024
          tx: 1024
      ipv4:
        address:
        - ip: 10.0.135.114
          prefix-length: 17
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        dhcp: true
        enabled: true
      ipv6:
        address:
        - ip: fe80::f29c:426a:51f7:fdbe
          prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
      lldp:
        enabled: false
      mac-address: 0A:FC:40:AF:29:62
      mtu: 9001
      name: ens5
      state: up
      type: ethernet
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          rx-gro: true
          rx-gro-list: false
          tx-generic-segmentation: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
      ipv4:
        address:
        - ip: 127.0.0.1
          prefix-length: 8
        enabled: true
      ipv6:
        address:
        - ip: ::1
          prefix-length: 128
        enabled: true
      mac-address: "00:00:00:00:00:00"
      mtu: 65536
      name: lo
      state: up
      type: unknown
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          highdma: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-gre-csum-segmentation: true
          tx-gre-segmentation: true
          tx-ipxip4-segmentation: true
          tx-ipxip6-segmentation: true
          tx-nocache-copy: false
          tx-scatter-gather-fraglist: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
          tx-udp_tnl-csum-segmentation: true
          tx-udp_tnl-segmentation: true
          tx-vlan-hw-insert: true
          tx-vlan-stag-hw-insert: true
      ipv4:
        address:
        - ip: 10.129.0.1
          prefix-length: 23
        enabled: true
      ipv6:
        address:
        - ip: fe80::c44a:2aff:fe80:199e
          prefix-length: 64
        enabled: true
      mac-address: C6:4A:2A:80:19:9E
      mtu: 8951
      name: tun0
      state: up
      type: ovs-interface
    - accept-all-mac-addresses: false
      ethtool:
        feature:
          rx-checksum: true
          rx-gro: true
          rx-gro-list: false
          tx-checksum-ip-generic: true
          tx-generic-segmentation: true
          tx-nocache-copy: false
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
      ipv4:
        address: []
        enabled: false
      ipv6:
        address:
        - ip: fe80::9069:53ff:fe96:84d9
          prefix-length: 64
        enabled: true
      lldp:
        enabled: false
      mac-address: 92:69:53:96:84:D9
      mtu: 65000
      name: vxlan_sys_4789
      state: down
      type: vxlan
      vxlan:
        base-iface: ""
        destination-port: 4789
        id: 0
        remote: ""
    routes:
      config:
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      running:
      - destination: fe80::/64
        metric: 100
        next-hop-address: '::'
        next-hop-interface: ens5
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: vxlan_sys_4789
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: tun0
        table-id: 254
      - destination: 0.0.0.0/0
        metric: 100
        next-hop-address: 10.0.128.1
        next-hop-interface: ens5
        table-id: 254
      - destination: 10.0.128.0/17
        metric: 100
        next-hop-address: 0.0.0.0
        next-hop-interface: ens5
        table-id: 254
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
  handlerNetworkManagerVersion: 1.39.0-1.el8
  handlerNmstateVersion: 1.2.1
  hostNetworkManagerVersion: 1.30.0
  lastSuccessfulUpdateTime: "2022-04-24T17:20:49Z"
wdrdres3qew5ts21 commented 2 years ago

Updated after I found some cluster that come with Red Hat 8.3 seem like NNCP working right away without any problem I can using Linux Bridge to routing between Node now. I'm using same configuration don't change anything except Host OS then it working

sh-4.4# nmcli -v
nmcli tool, version 1.26.0-13.1.rhaos4.7.el8
sh-4.4# cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="47.83.202103041352-0"
VERSION_ID="4.7"
OPENSHIFT_VERSION="4.7"
RHEL_VERSION="8.3"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 47.83.202103041352-0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.7"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.7"
OSTREE_VERSION='47.83.202103041352-0'
sh-4.4# uname -a
Linux worker-2.ocp.example.com 4.18.0-240.15.1.el8_3.x86_64 #1 SMP Wed Feb 3 03:12:15 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

Now ens5 Interface had been master of brse bridge that configure from NMState and my Multus CNI that using brse bridge also working in connect routing across Kubernetes Node.

image
qinqon commented 2 years ago

@wdrdres3qew5ts21 is this still an issue ?

wdrdres3qew5ts21 commented 2 years ago

@qinqon If I using Red Hat 8.3 instead of 8.4 it will working without any problem for now.