Closed ccascone closed 5 years ago
Besides the ONOS bug mentioned above, there was a race condition in the app that was causing groups to be created with only one member, fixed in 65a3b859fba3fcaf35b927cf4022590ef28bef64
The second ECMP bucket is still missing:
tsengyi@root > groups any device:leaf1 19:52:33
deviceId=device:leaf1, groupCount=3
id=0xec3b0000, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0xec3b0000, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000102, port=0x80, smac=0xaa00000001, dst_vlan=0x1)]
id=0xec3b0000, bucket=2, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000101, port=0x90, smac=0xaa00000001, dst_vlan=0x1)]
id=0x7e9dcdb4, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0x7e9dcdb4, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df149, port=0x2c, smac=0xaa00000001, dst_vlan=0x1)]
id=0x7e9dcdb5, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0x7e9dcdb5, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df148, port=0x3c, smac=0xaa00000001, dst_vlan=0x1)]
tsengyi@root > groups any device:leaf2 19:52:35
deviceId=device:leaf2, groupCount=3
id=0xec3b0000, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0xec3b0000, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000101, port=0x32, smac=0xaa00000002, dst_vlan=0x1)]
id=0x7e9dccc4, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0x7e9dccc4, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df039, port=0x3e, smac=0xaa00000002, dst_vlan=0x1)]
id=0x7e9dccc5, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
id=0x7e9dccc5, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df038, port=0x3a, smac=0xaa00000002, dst_vlan=0x1)]
I should have tested this more. I didn't test it on TH (leaf2) because of #48.
I have the feeling it's because we make the wrong use of the Group API/implementation in ONOS:
You might want to check with Charles/Pier on the correct use of the group API to modify existing groups.
Finally, I noticed it takes longer for ONOS to receive port up notifications from TH, and so to discover links attached to that. That's probably why the bucket issue is more frequent on this box?
It looks like this issue was caused by a bug in ONOS that would led to the generation of action profile member IDs set to zero, which is not valid according to P4Runtime.
The issue has been fixed in https://gerrit.onosproject.org/22469
I want to do some more testing before closing this issue.