ligato / vpp-agent

⚡️ Control plane management agent for FD.io's VPP
https://docs.ligato.io/
Apache License 2.0
252 stars 125 forks source link

Allowing agent to process change requests without requiring initial resync from NB #1019

Open edwarnicke opened 6 years ago

edwarnicke commented 6 years ago

vppagent should resync from vpp independently at startup independently of whether it receives a Resync request from the outside world.

Failing to do so means that you guarantee bugs if a simple request comes in and precludes certain valid use patterns.

ondrej-fabry commented 6 years ago

On the other hand, if you do SB resync at startup, when you actually don't have any state from NB yet, how will you decide which configuration should be:

What about crash/restart? This way the agent would always delete all unrecognized configuration from VPP and would need to notify NB entity that it did resync on its own. And this entity, which configured it before would have to reconfigure everything again, creating a time window during which the VPP would have empty configuration, because agent resynced with no data after reboot.

With current behavior: With the current implementation, the agent does not do anything on its own and remains on standby until NB sends request for the initial resync with actual configuration. The agent will then compare this NB config with the SB config that gets dumped from VPP. And most importantly, effectively not needing to re-add anything at all, which might always cause some dropped packets.

Planned Solution: Keeping Local Backup

This solution originated as requirement from the Security Team and is currently in our sprint.

We could perhaps, keep some local backup (BoltDB) and update it on each gRPC request so it always has the latest data. This backup could then be loaded on startup for the initial resync. This could deal with the scenario described above, but we should find out if this won't bring any other issues or limitations.

ondrej-fabry commented 6 years ago

Alternative Solution: Dump VPP on boot just to determine state

Initial resync would only be sync from SB (read-only). No modification or deleting any unrecognized configuration. This could then be used for incoming NB change request to compare particular configs from the request with the latest dump from SB, allowing agent to reliably handle configuration changes even immediately after restarts.

And later, when the next resync request comes from NB, the agent will effectively compare the states and reconfigure SB if it's needed. After successful NB resync, the agent will switch to "normal mode" (synced).

This solution would not be dependent on any local backup, while avoiding the need to reconfigure anything from VPP (possibly causing drops), but it might require something to trigger the resync to ensure the full synchronization. This could either be something that watches the process or even the gRPC client could trigger it on establishing connections.

Guys what do you think? @edwarnicke @milanlenco @rastislavszabo

EDIT: Actually now I'm realizing that KVScheduler (in v2) already supports pure SB sync, it's called downstream resync there. However, I'm not sure now if the KVScheduler also supports similar scenario like one described above..