opennetworkinglab / stratum-onos-demo

Stratum+ONOS demo @ ONF Connect 2019 & EuroP4 '19
Apache License 2.0
13 stars 7 forks source link

Only one ECMP bucket gets installed on TH when using ONOS #43

Closed ccascone closed 5 years ago

ccascone commented 5 years ago

It looks like this issue was caused by a bug in ONOS that would led to the generation of action profile member IDs set to zero, which is not valid according to P4Runtime.

The issue has been fixed in https://gerrit.onosproject.org/22469

I want to do some more testing before closing this issue.

ccascone commented 5 years ago

Besides the ONOS bug mentioned above, there was a race condition in the app that was causing groups to be created with only one member, fixed in 65a3b859fba3fcaf35b927cf4022590ef28bef64

pudelkoM commented 5 years ago

The second ECMP bucket is still missing:

tsengyi@root > groups any device:leaf1                                                                                                 19:52:33
deviceId=device:leaf1, groupCount=3
   id=0xec3b0000, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0xec3b0000, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000102, port=0x80, smac=0xaa00000001, dst_vlan=0x1)]
       id=0xec3b0000, bucket=2, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000101, port=0x90, smac=0xaa00000001, dst_vlan=0x1)]
   id=0x7e9dcdb4, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0x7e9dcdb4, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df149, port=0x2c, smac=0xaa00000001, dst_vlan=0x1)]
   id=0x7e9dcdb5, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0x7e9dcdb5, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df148, port=0x3c, smac=0xaa00000001, dst_vlan=0x1)]
tsengyi@root > groups any device:leaf2                                                                                                 19:52:35
deviceId=device:leaf2, groupCount=3
   id=0xec3b0000, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0xec3b0000, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0xaa00000101, port=0x32, smac=0xaa00000002, dst_vlan=0x1)]
   id=0x7e9dccc4, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0x7e9dccc4, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df039, port=0x3e, smac=0xaa00000002, dst_vlan=0x1)]
   id=0x7e9dccc5, state=ADDED, type=SELECT, bytes=0, packets=0, appId=org.stratumproject.fabric-demo, referenceCount=0
       id=0x7e9dccc5, bucket=1, bytes=0, packets=0, weight=1, actions=[ingress.l3_fwd.set_nexthop(dmac=0x3cfdfe9df038, port=0x3a, smac=0xaa00000002, dst_vlan=0x1)]
ccascone commented 5 years ago

I should have tested this more. I didn't test it on TH (leaf2) because of #48.

I have the feeling it's because we make the wrong use of the Group API/implementation in ONOS:

  1. Group gets created first with one bucket because only one leaf-spine link has been discovered
  2. The second link gets discovered, the app picks up the event and creates a new group with 2 buckets
  3. App installs the new group, that should replace the old one... but maybe replacing an existing group is not supported? The group API has methods to add/remove buckets

You might want to check with Charles/Pier on the correct use of the group API to modify existing groups.

Finally, I noticed it takes longer for ONOS to receive port up notifications from TH, and so to discover links attached to that. That's probably why the bucket issue is more frequent on this box?