Closed tumido closed 2 years ago
Hi @tumido @naved001 I got pointed at this issue to see if we could assist.
A few quick thoughts:
eno1
is part of a bond? Changing the parent/base-iface to the bond (ex. bond0) instead of a tail in that pairing.eno1
is not managed by NetworkManager but via ip link? Bugzilla-2058292 Happy to dig in further if you want to hit me up on CoreOS Slack/RH Gchat. Cheers!
@nerdalert Thank you for reaching out! :) That seems to be it!
At first, I've checked if the eno1
is not managed by NM. nmcli device
was showing it as:
# nmcli device
DEVICE TYPE STATE CONNECTION
...
eno1 ethernet connected Wired Connection
...
Then I've checked if eno1
is set to bond
type for whatever reason, but it was correctly typed as ethernet
:
# nmcli conn
NAME UUID TYPE DEVICE
Wired Connection 7fa7f094-b884-4dd7-bb09-d7c35251e059 ethernet eno1
Then I've tried applying the change from BZ 2017623 as you've suggested...
FTR the change forces eno1
to be managed by NM/NMState:
spec:
desiredState:
interfaces:
+ - name: eno1
+ state: up
+ type: ethernet
- description: zero cluster provisioning network
ipv4:
dhcp: true
enabled: true
name: eno1.211
state: up
type: vlan
vlan:
base-iface: eno1
id: 211
I didn't expect this to change anything however it resulted in creating a new connection, stealing the eno1
device from Wired Connection
connection:
# nmcli conn
NAME UUID TYPE DEVICE
...
+ eno1 75d0e44b-a6af-406b-903f-df3efaecccce ethernet eno1
+ Wired Connection 7fa7f094-b884-4dd7-bb09-d7c35251e059 ethernet --
- Wired Connection 7fa7f094-b884-4dd7-bb09-d7c35251e059 ethernet eno1
This resulted in:
$ oc get nncp vlan-211-nese
NAME STATUS
vlan-211-nese SuccessfullyConfigured
I think the mess caused by a name change in the connection, maybe during a cluster/node upgrade? Mind this is a baremetal instance rocking OCP for about 2 years now, starting at OCP 4.6, now at OCP 4.10... I think we faced some networking issues during one of the upgrades when nodes were renamed etc... so it may be related. It may have manifested now because we tried to apply a new network policy (just speculating).
Originally reported by @larsks:
Originally posted by @larsks in https://github.com/operate-first/apps/pull/2286#pullrequestreview-1091209654
I've discovered there's probably more to it and the issue goes a deeper:
The NESE policy fails on: