sonic-net / sonic-swss

SONiC Switch State Service (SwSS)
https://azure.github.io/SONiC
Other
170 stars 506 forks source link

[fdb]fdborch: fdborch does not guarantee the expected order of events from APP_DB #1134

Open jianjundong opened 4 years ago

jianjundong commented 4 years ago

FdbOrch is the consumer of 'FDB_TABLE', and is the NotificationConsumer of 'FLUSHFDBREQUEST'.

  1. First, one application such as MCLAG send one request to ASIC to clear all the FDB items.
  2. Then, MCLAG add one dynamic MAC to ASIC. In theory, if the MAC learning is disabled, it will be one dynamic MAC in ASIC. But sometimes the FDB table is NULL in ASIC. The reason is that fdborch handle the MAC_ADD_EVNET event before the FLUSH_ALL_FDB, the order of events is not guaranteed.
MikeHcChen commented 4 years ago

I think fdborch cannot know what is the EXPECTED/CORRECT order of fdb events. The application like MCLAG should be responsible it by checking the status of current FDB table. Ex, MCLAG can check whether the adding one dynamic MAC to ASIC is successful. If it is not, delay some short time, do the adding request again.

jianjundong commented 4 years ago

@MikeHcChen, thanks for the comment. Applications, such as MCLAG, does not communicate with redis-DB directly. MCLAG sends the MAC event to mclagsyncd, and mclagsyncd writes the event into APP_DB. fdborch is the consumer of APP_DB, and it then handle the MAC event. For MAC add event in fdborch, when addFdbEntry() return fail, it will keep trying until it succeeds. For MCLAG, it is no need to check whether the adding one dynamic MAC to ASIC is successful, since fdborch guarantee this. In other words, how long to delay to check this? Like the above example, MCLAG send one request to ASIC to clear all the FDB items, and then MCLAG add one dynamic MAC to ASIC. If fdborch handle the MAC_ADD_EVNET event before the FLUSH_ALL_FDB, it maybe that fdborch handle the FLUSH_ALL_FDB after that MCLAG check the MAC add is successful.