open-switch / opx-cps

https://openswitch.net
6 stars 15 forks source link

Setting physical address on LAG interface doesn't always stick #81

Closed dimbleby closed 6 years ago

dimbleby commented 6 years ago

Our log shows us setting the MAC address on bnd1 here, to 00:01:02:03:04:05:

2018-04-19 09:39:14,146 UTC INFO Executing CPS transaction: [{'operation': 'rpc', 'change': {'data': {'dell-if/if/interfaces/interface/phys-address': bytearray(b'00:01:02:03:04:05\x00'), 'if/interfaces/interface/name': bytearray(b'bnd1\x00'), 'if/interfaces/interface/type': bytearray(b'ianaift:ieee8023adLag\x00'), 'dell-if/if/interfaces/interface/mtu': bytearray(b'\xfc\x05\x00\x00'), 'dell-base-if-cmn/set-interface/input/operation': bytearray(b'\x03\x00\x00\x00'), 'if/interfaces/interface/enabled': bytearray(b'\x01\x00\x00\x00')}, 'key': '1.19.1245192.'}}]

But a few minutes later, the target physical address for this interface has changed:

# cps_get_oid.py -qua target dell-base-if-cmn/if/interfaces/interface if/interfaces/interface/name=bnd1

============base-if-lag/if/interfaces/interface==========

dell-if/if/interfaces/interface/phys-address = 00:50:56:92:d3:c1
base-if-lag/if/interfaces/interface/id = 0
base-if-lag/if/interfaces/interface/lag-opaque-data = 01000000000000002c000000000000000100000000000000040000000000000000000000020000000000000008000000000000000000000000000200
if/interfaces/interface/type = ianaift:ieee8023adLag
dell-base-if-cmn/if/interfaces/interface/if-index = 38
if/interfaces/interface/name = bnd1
base-if-lag/if/interfaces/interface/unblock-port-list =
dell-if/if/interfaces/interface/mtu = 1532
if/interfaces/interface/enabled = 1
base-if-lag/if/interfaces/interface/block-port-list =
------------------------------------------------

(and indeed the actual MAC address on this interface is this address, rather than the one we configured).

00:50:56:92:d3:c1 is the autogenerated MAC address for e101-002-0, which is slave to bnd1:

# ip link show dev e101-002-0
4: e101-002-0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bnd1 state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:92:d3:c1 brd ff:ff:ff:ff:ff:ff
    alias NAS## 0 29

I suppose this is some sort of window condition - we don't see it always. Usually our configuration sticks.

In what circumstances would CPS discard our configured target physical address?

We are running with the latest OPX packages from the deb http://deb.openswitch.net/ 2.3.0 main opx opx-non-free repository

dimbleby commented 6 years ago

Indeed, on this switch I now can't get the physical address to change at all - even though CPS reports success when I configure it:

# cps_set_oid.py dell-base-if-cmn/if/interfaces/interface -oper set -attr if/interfaces/interface/name=bnd1 -attr dell-if/if/interfaces/interface/phys-address=00:01:02:03:04:05
Success
Key: 1.19.44.2883617.2883612.2883613.
dell-base-if-cmn/if/interfaces/interface/if-index = 38
dell-if/if/interfaces/interface/phys-address = 00:01:02:03:04:05
cps/object-group/return-code = 0
if/interfaces/interface/name = bnd1
dell-base-if-cmn/if/interfaces/interface/if-index = 38

# cps_get_oid.py -qua target dell-base-if-cmn/if/interfaces/interface if/interfaces/interface/name=bnd1

============base-if-lag/if/interfaces/interface==========

dell-if/if/interfaces/interface/phys-address = 00:50:56:92:d3:c1
base-if-lag/if/interfaces/interface/id = 0
base-if-lag/if/interfaces/interface/lag-opaque-data = 01000000000000002c000000000000000100000000000000040000000000000000000000020000000000000008000000000000000000000000000200
if/interfaces/interface/type = ianaift:ieee8023adLag
dell-base-if-cmn/if/interfaces/interface/if-index = 38
if/interfaces/interface/name = bnd1
base-if-lag/if/interfaces/interface/unblock-port-list =
dell-if/if/interfaces/interface/mtu = 1532
if/interfaces/interface/enabled = 1
base-if-lag/if/interfaces/interface/block-port-list =
------------------------------------------------

According to our logs, the following happens when we set the MAC address on the interface:

I will look into why it is that we move e101-002-0 out from and back into the bond.

However the question for you remains: why is the physical address on bnd1 being changed from the configured target value?

dimbleby commented 6 years ago

The reason that we take e101-002-0 out of the bond is that this switch should be standby in the LAG. Once it has the correct MAC address, we recognise that the partner is primary and that we are not needed.

So I think we understand the sequence, if not what to do about it!

How can we make the MAC address stick on bnd1?

atanu-mandal commented 6 years ago

We are looking into this issue.

GarrickHe commented 6 years ago

Hello, I will look into this issue and keep everyone posted.

GarrickHe commented 6 years ago

@dimbleby What switch/platform are you using?

dimbleby commented 6 years ago

S6000-VM

GarrickHe commented 6 years ago

@dimbleby , I'm trying very hard to reproduce the issue but is unable to. I am using opx-2.3.0. Can you give more details on how you were able to run into the issue?

  1. Did you run the LAG cps commands immediately after system startup?
  2. Are there other configs on the system that has the issue? If yes, do you mind sharing them?
  3. What were the exact commands you used to create the LAG interface (bnd1)? Did you use cps_config_lag.py or cps_set_oid?

I tried the following to reproduce:

  1. create LAG called 'bnd1' (cps_config_lag.py --create --lname bnd1)
  2. add a logical port into it e101-010-0 (cps_config_lag.py --add --lname bnd1 --port e101-010-0)
  3. check the hw-addr on both bnd1 and the e101-010-0, they match bnd1 hw-addr (as expected)
  4. remove e101-010-0 from bnd1 and recheck the hw-addr (they are different, as expected) (cps_config_lag.py --remove --lname bnd1 --port 3101-010-0)
  5. re-add e101-010-0 and recheck the hw-addr (they match bnd1's address, as expected)

Thanks, Garrick

dimbleby commented 6 years ago

Hi Garrick,

Here's a session:

# cps_config_lag.py --create --lname bnd1
Success

# NB Set the MAC address
# cps_set_oid.py dell-base-if-cmn/if/interfaces/interface -oper set  -attr if/interfaces/interface/name=bnd1  -attr dell-if/if/interfaces/interface/phys-address=01:02:03:04:05:06
Success
Key: 1.19.44.2883617.2883612.2883613.
dell-base-if-cmn/if/interfaces/interface/if-index = 43
dell-if/if/interfaces/interface/phys-address = 01:02:03:04:05:06
cps/object-group/return-code = 0
if/interfaces/interface/name = bnd1
dell-base-if-cmn/if/interfaces/interface/if-index = 43

# NB Verify that this really did work
# cps_get_oid.py dell-base-if-cmn/if/interfaces/interface -attr if/interfaces/interface/name=bnd1

============base-if-lag/if/interfaces/interface==========

dell-if/if/interfaces/interface/phys-address = 01:02:03:04:05:06
base-if-lag/if/interfaces/interface/id = 1
base-if-lag/if/interfaces/interface/lag-opaque-data = 01000000000000002c000000000000000100000000000000040000000000000000000000020000000000000008000000000000000100000000000200
if/interfaces/interface/type = ianaift:ieee8023adLag
dell-base-if-cmn/if/interfaces/interface/if-index = 43
if/interfaces/interface/name = bnd1
base-if-lag/if/interfaces/interface/unblock-port-list =
dell-if/if/interfaces/interface/mtu = 1532
if/interfaces/interface/enabled = 0
base-if-lag/if/interfaces/interface/block-port-list =
------------------------------------------------

# NB add e101-010-0 to logical port
# cps_config_lag.py --add --lname bnd1 --port e101-010-0
Success

# NB But now the MAC address on bnd1 has changed
# cps_get_oid.py dell-base-if-cmn/if/interfaces/interface -attr if/interfaces/interface/name=bnd1

============base-if-lag/if/interfaces/interface==========

dell-if/if/interfaces/interface/phys-address = ec:f4:bb:fd:53:e5
base-if-lag/if/interfaces/interface/id = 1
base-if-lag/if/interfaces/interface/lag-opaque-data = 01000000000000002c000000000000000100000000000000040000000000000000000000020000000000000008000000000000000100000000000200
if/interfaces/interface/type = ianaift:ieee8023adLag
dell-if/if/interfaces/interface/member-ports/name = e101-010-0
dell-base-if-cmn/if/interfaces/interface/if-index = 43
if/interfaces/interface/name = bnd1
base-if-lag/if/interfaces/interface/unblock-port-list = e101-010-0
dell-if/if/interfaces/interface/mtu = 1532
if/interfaces/interface/enabled = 0
base-if-lag/if/interfaces/interface/block-port-list =
------------------------------------------------

The MAC address on bnd1 was configured as 01:02:03:04:05:06, but when I added the interface e101-010-0 into the logical port it was set to ec:f4:bb:fd:53:e5. I did not ask for this change - so far as my use of the CPS API is concerned, the configured target physical address on this interface is still 01:02:03:04:05:06.

This particular repro was on hardware rather than the VM:

PLATFORM="S6000-ON"
GarrickHe commented 6 years ago

Thanks David, I'll keep you posted. I'm assuming this is a stand-alone device without any other device connected to it and no other configuration is loaded?

-garrick

dimbleby commented 6 years ago

correct

GarrickHe commented 6 years ago

@dimbleby I am able to reproduce the issue. i will update you when I have a fix.

thanks, garrick

GarrickHe commented 6 years ago

@dimbleby , just a quick update. the issue is with the 'cps_config_lag --add --lname ...' command that is used to add the memberport. When it's modifying the LAG interface it will always reset the MAC address, since there wasn't any information regarding a mac address it will always go back to the default MAC address, hence modifying the mac address you set.

I'll let you know when I have a fix.

hope this helps, garrick

GarrickHe commented 6 years ago

@dimbleby I have a fix. I will push it soon after code-review.

Thanks, Garrick

GarrickHe commented 6 years ago

Issue fixed as part of OPX3.0-dev1.