stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Adjust Network Configuration for better OpenShift-Virt usability #116

Closed rbo closed 10 months ago

rbo commented 1 year ago

The nodes Inf7 and Inf8

Inf7

sh-4.4# hostname
inf7
sh-4.4# ip add show dev enp4s0f1
3: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1e:67:51:5e:9b brd ff:ff:ff:ff:ff:ff
    inet 10.32.111.137/20 brd 10.32.111.255 scope global dynamic noprefixroute enp4s0f1
       valid_lft 7240sec preferred_lft 7240sec
    inet6 fe80::d080:e172:d294:2f2a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4# ip add show dev enp4s0f0 
2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1e:67:51:5e:9a brd ff:ff:ff:ff:ff:ff
    inet 10.32.96.7/20 brd 10.32.111.255 scope global dynamic noprefixroute enp4s0f0
       valid_lft 9015sec preferred_lft 9015sec
    inet6 fe80::51d8:6b87:af34:e96f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4# cat /sys/class/net/enp4s0f?/device/sriov_totalvfs 
63
63

Inf8

sh-4.4# hostname
inf8
sh-4.4#  ip add show dev enp4s0f1
3: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1e:67:51:5e:a0 brd ff:ff:ff:ff:ff:ff
    inet 10.32.111.138/20 brd 10.32.111.255 scope global dynamic noprefixroute enp4s0f1
       valid_lft 5406sec preferred_lft 5406sec
    inet6 fe80::f9fa:137c:fb13:e29/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4#  ip add show dev enp4s0f0 
2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1e:67:51:5e:9f brd ff:ff:ff:ff:ff:ff
    inet 10.32.96.8/20 brd 10.32.111.255 scope global dynamic noprefixroute enp4s0f0
       valid_lft 10226sec preferred_lft 10226sec
    inet 10.32.111.127/32 scope global enp4s0f0
       valid_lft forever preferred_lft forever
    inet6 fe80::136a:c007:7555:b6d3/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4# cat /sys/class/net/enp4s0f?/device/sriov_totalvfs 
63
63
sh-4.4# 

Interfaces

enp4s0f0 - Primary

Primary IP / Interface for OpenShift SDN

We can enable SRIOV add these interfaces @dmoessne

enp4s0f1 - Secondary

Create a Linux Bridge for Virtual Machines

rbo commented 1 year ago
$ oc label node/inf7 coe.muc.redhat.com/second-nic=enp4s0f1
$ oc label node/inf8 coe.muc.redhat.com/second-nic=enp4s0f1

$ oc get nodes -L coe.muc.redhat.com/second-nic -L kubevirt.io/schedulable
NAME                      STATUS   ROLES           AGE    VERSION           SECOND-NIC   SCHEDULABLE
inf4.coe.muc.redhat.com   Ready    master,worker   268d   v1.24.6+5157800                true
inf44                     Ready    worker          30d    v1.24.6+5157800                true
inf5.coe.muc.redhat.com   Ready    master,worker   268d   v1.24.6+5157800                true
inf6.coe.muc.redhat.com   Ready    master,worker   268d   v1.24.6+5157800                true
inf7                      Ready    worker          161d   v1.24.6+5157800   enp4s0f1     true
inf8                      Ready    worker          17d    v1.24.6+5157800   enp4s0f1     true
sf1                       Ready    sriov,worker    161d   v1.24.6+5157800                true
sf2                       Ready    sriov,worker    160d   v1.24.6+5157800                true
sf3                       Ready    sriov,worker    161d   v1.24.6+5157800                true
sf4                       Ready    sriov,worker    161d   v1.24.6+5157800                true
rbo commented 10 months ago

Cluster is reinstalled severall times. Most worknodes have a second interface:

oc get nodes -L coe.muc.redhat.com/second-nic -L kubevirt.io/schedulable
NAME                 STATUS                        ROLES                  AGE     VERSION           SECOND-NIC   SCHEDULABLE
inf4                 Ready                         control-plane,master   5d21h   v1.27.6+f67aeb3                
inf44                Ready                         storage-node,worker    5d19h   v1.27.6+f67aeb3   eno2         true
inf5                 Ready                         control-plane,master   5d20h   v1.27.6+f67aeb3                
inf6                 Ready                         control-plane,master   5d21h   v1.27.6+f67aeb3                
inf7                 Ready                         storage-node,worker    5d12h   v1.27.6+f67aeb3   enp4s0f1     true
inf8                 Ready                         storage-node,worker    5d15h   v1.27.6+f67aeb3   enp4s0f1     true
ucs-blade-server-1   NotReady,SchedulingDisabled   worker                 5d20h   v1.27.6+f67aeb3                true
ucs-blade-server-3   Ready                         worker                 5d17h   v1.27.6+f67aeb3                true
ucs56                Ready                         worker                 5d17h   v1.27.6+f67aeb3   enp79s0f1    true

Idear is:

rbo commented 10 months ago

Check nodes & DHCPd

inf44

oc debug node/inf44 -- ip link show dev eno2
Starting pod/inf44-debug ...
To use host binaries, run `chroot /host`
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master coe-bridge state DOWN mode DEFAULT group default qlen 1000
    link/ether ac:16:2d:ad:21:a4 brd ff:ff:ff:ff:ff:ff
    altname enp3s0f1

Removing debug pod ...

inf7

 oc debug node/inf7 -- ip link show dev enp4s0f1
Starting pod/inf7-debug ...
To use host binaries, run `chroot /host`
3: enp4s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master coe-bridge state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1e:67:51:5e:9b brd ff:ff:ff:ff:ff:ff

Removing debug pod ...

inf8

 oc debug node/inf8 -- ip link show dev enp4s0f1
Starting pod/inf8-debug ...
To use host binaries, run `chroot /host`
3: enp4s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master coe-bridge state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1e:67:51:5e:a0 brd ff:ff:ff:ff:ff:ff

Removing debug pod ...

ucs56

Check arista

arista-rj45#show interface  Et12,Et7-8 status
Port       Name              Status       Vlan        Duplex  Speed Type        
Et7        inf8-secondary    disabled     trunk         auto   auto 10GBASE-T   
Et8        inf7-secondary    disabled     trunk         auto   auto 10GBASE-T   
Et12       ucs56 secondary   disabled     trunk         auto   auto 10GBASE-T   

arista-sfp#show interfaces Et6/1 status
Port       Name                 Status       Vlan     Duplex Speed  Type         Flags Encapsulation
Et6/1      inf44 secondary eno2 disabled     trunk    full   10G    40GBASE-CR4                    

Enable interfaces:

arista-rj45>enable 
arista-rj45#configure interface Et12,Et7-8 
% Invalid input
arista-rj45#configure 
arista-rj45(config)#interface Et12,Et7-8 
arista-rj45(config-if-Et7-8,12)#no shutdown
arista-rj45(config-if-Et7-8,12)#show interface  Et12,Et7-8 status
Port       Name              Status       Vlan        Duplex  Speed Type        
Et7        inf8-secondary    connected    trunk       a-full  a-10G 10GBASE-T   
Et8        inf7-secondary    notconnect   trunk         auto   auto 10GBASE-T   
Et12       ucs56 secondary   notconnect   trunk       a-full  a-10G 10GBASE-T   

arista-rj45(config-if-Et7-8,12)#show interface  Et12,Et7-8 status
Port       Name              Status       Vlan        Duplex  Speed Type        
Et7        inf8-secondary    connected    trunk       a-full  a-10G 10GBASE-T   
Et8        inf7-secondary    connected    trunk       a-full  a-10G 10GBASE-T   
Et12       ucs56 secondary   connected    trunk       a-full  a-10G 10GBASE-T   

arista-rj45(config-if-Et7-8,12)#write
Copy completed successfully.
arista-rj45(config-if-Et7-8,12)#copy running-config startup-config
Copy completed successfully.
arista-rj45(config-if-Et7-8,12)#
arista-rj45#

arista-sfp#configure 
arista-sfp(config)#interface  Et6/1
arista-sfp(config-if-Et6/1)#no shutdown 
arista-sfp(config-if-Et6/1)#write 
Copy completed successfully.
arista-sfp(config-if-Et6/1)#copy running-config startup-config 
Copy completed successfully.
arista-sfp(config-if-Et6/1)#show interfaces Et6/1 status
Port       Name                 Status       Vlan     Duplex Speed  Type         Flags Encapsulation
Et6/1      inf44 secondary eno2 connected    trunk    full   10G    40GBASE-CR4                    

arista-sfp(config-if-Et6/1)#

DHCP reject did not worked:

UCS56:

9: enp80s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 80:e0:1d:36:ff:ae brd ff:ff:ff:ff:ff:ff
    inet 10.32.111.101/20 brd 10.32.111.255 scope global dynamic noprefixroute enp80s0f1
       valid_lft 10708sec preferred_lft 10708sec
    inet6 fe80::d310:1000:bb2b:c87/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
rbo commented 10 months ago

NNCP prevent IPv4 on other nodes:

$ oc get nncp
NAME                       STATUS      REASON
coe-bridge-via-eno2        Available   SuccessfullyConfigured
coe-bridge-via-enp4s0f1    Available   SuccessfullyConfigured
coe-bridge-via-enp79s0f1   Degraded    FailedToConfigure
oc get nncp/coe-bridge-via-enp4s0f1  -o yaml | grep -A20 'spec:'
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp4s0f1
      description: Linux Brige info COE Network via enp4s0f1
      ipv4:
        enabled: false
      name: coe-bridge
      state: up
      type: linux-bridge
  nodeSelector:
    coe.muc.redhat.com/second-nic: enp4s0f1
status:
  conditions:
  - lastHeartbeatTime: "2023-11-21T08:09:23Z"