Restore inrefaces on restart in case of external VPP

glazychev-art commented 2 years ago

Description

When we create VPP on the forwarder pod we assume that forwarder death means VPP death. But it is not right for the external VPP socket (e.g. Calico-VPP case). If forwarder restarts it can leave network interfaces there.

In this issue we need to prepare possible solutions.

glazychev-art commented 2 years ago

Possible solution [DRAFT]

Use restoreInterface and expireInterface

Improve tag chain element to store not only connID but nsm prefix, client or server identification and expiration time. For example it can looks like: nsm-c-1644245337-6434d7fe-a03c-40fe-8a38-321e5b7967c4, where
- nsm - NSM prefix,
- c or (s) - means client (or server),
- 1644245337 - expiration time in sec,
- 6434d7fe-a03c-40fe-8a38-321e5b7967c4 - connection ID After each refresh we need to update the expiration time of the corresponding interfaces.

On the forwarder startup we need to Dump nsm interfaces and store them to the list of:

type interfaceInfo struct {
clientIfIndex uint32
serverIfIndex uint32
connID string
expirationTime uint32
}

Implement restoreInterface chain element It receives a list of interfaceInfo on creation and restore the connection metadata in case of healing Request (or refresh)
Implement expireInterface chain element. It will monitor the interfaces for their timely removal. It also receives a list of interfaceInfo on creation to start monitoring the old interfaces. When time runs out it deletes clientInterface, serverInterface and all possible cross-connects

denis-tingaikin commented 2 years ago

@glazychev-art Can we go with something more simple?

Basically, we need to do this:

Forwarder starts -> dump vpp interfaces and remember all tagged interfaces.
Start async delayed jobs for each tagged interface that will simply call Close via begin chain element. Delay time can be equal to maxtTokenLifeTime
For each refresh request from the client cancel the jobs from step2.

I guess tag can be: ${forwawrderName}-${connectionId}. It will allow using multi-forwarder instances for the node.

glazychev-art commented 2 years ago

I partly agree.

Probably we can use tag like this: ${forwarderName}-${connectionId}. But what if forwarder restarted and ${forwarderName} was changed?

Concerning step 2: We have no information about previous connections after Forwarder restart in begin: https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/begin/server.go#L87 And we have no metadata to delete something on Close. For example, let's take a look at tap mechanism: https://github.com/networkservicemesh/sdk-vpp/blob/main/pkg/networkservice/mechanisms/kernel/kerneltap/common.go#L145-L146 maxtTokenLifeTime is expired:

After dumping, we don't know what mechanism was used.
We don't have swIfIndex in ifindex metadata. We need to fill all metadata somehow before the Close call.

The basic idea of the https://github.com/networkservicemesh/sdk-vpp/issues/512#issuecomment-1044531011 is:

Forwarder starts -> dump vpp interfaces and remember all tagged interfaces.
Start async delayed jobs for each tagged interface that will just remove interfaces via vpp-api, regardless of mechanisms or metadata after expiration time (e.g. maxtTokenLifeTime) (I think we can also remove L2xc or L3xc)
For each refresh request from the client cancel the jobs from step2 and fill metadata.

glazychev-art commented 2 years ago

Additional thoughts.

If we want new forwarders to be responsible for the previous ones.

We need to remember, that:

The forwarder can be restarted and its name can be saved.
The forwarder can be restarted and its name can be changed.
We can have many forwarders on the same node.

So, I think we need a more complex tag. Like this: nsm-${client|server_identifier}-${refresh_identifier}-${forwarderName}-${connectionId}

Why do we need 'nsm-'? If forwarder restarted and its name was changed - this prefix is a reliable way to find nsm-interfaces.

_Why do we need '${client|serveridentifier}-'? If we want to restore the previous connection we need to distinguish client and server interfaces to store their swIfIndexes in ifindex metadata.

Why do we need '${forwarderName}'? We need to figure out - does this interface belong to our forwarder or not?

Why do we need '${connectionId}'? We need to know the connection this interface belongs to

_Why do we need '${refreshidentifier}'? If we know, that this interface doesn't belong to our forwarder - we can keep track it until it expires. We can check before deleting, is '${refresh_identifier}' was changed? If yes - that means this interface is used by other forwarder (identifier was updated on refresh) and we don't need to delete it. This identifier can be a simple counter.

Another question arises - the tag length is limited to 64 characters - https://github.com/FDio/vpp/blob/master/src/vnet/interface.api#L363

glazychev-art commented 2 years ago

We discussed another solution internally.

Probably we need to manage only the interfaces created by this forwarder or the previous one (i.e. its name was saved after the restart) In this case it is enough to use the tag like: ${forwarderName}-${client|server_identifier}-${connectionId}.

In other cases:

If it was grace deletion - we can try to call Close for all of the connections before closing the application. We can get connections, for example, using monitor.
If it was force deletion - we have no guarantee that interfaces will be removed. And the new forwarders will not be responsible for the previous.

denis-tingaikin commented 2 years ago

@glazychev-art

We've internally discussed this problem with @edwarnicke

Discussion results:

We're seeing two basic cases:

Forwarder restart
Forwarder delete(or upgrade)

Case 1, currently can be considered with a tag-based recovery strategy as we discussed in the PR meeting.

Case 2, currently is blocked. We need to verify that folks are using the k8s deployments upgrade feature. If so we can consider a solution with clearing resources on the application stops.

glazychev-art commented 2 years ago

@denis-tingaikin @edwarnicke

So, I think now it will look like this:

Tag: ${forwarderName}-${client|server_identifier}-${connID}

We have 2 main steps:

Step 1 Forwarder starts -> dump vpp interfaces, parse and store to map[connID]*ConnectionInterfaces all tagged interfaces.

type DeleteFn func() error

type IfaceInfo struct {
    delete    DeleteFn
    value     interface{}
}

type ConnectionInterfaces struct {
    clientMeta map[string]*IfaceInfo
    serverMeta map[string]*IfaceInfo
}

ConnectionInterfaces.clientMeta/serverMeta map stores values for the metadata. For example, clientMeta["ifindex"].value = 1 will become to swIfIndex = 1 ifindex for the client metadata. We have several such metadatas (links, peers...)

We can determine which particular interface is being used by InterfaceDevType ("memif", "VXLAN" and so on...). So we can choose the correct delete function (e.g. "memif" - > DeleteFn). So, after that step, we have parsed interfaces with the corresponding delete functions.

Step 2 Add restore chain element.

NewServer receives dumped map[connID]*ConnectionInterfaces. It also runs a goroutine for each connID to delete vpp-interfaces if they are no longer in use (call DeleteFns). Delay can be equal to maxtTokenLifeTime.
If we get a Request or Close with connID that we already have in the our map - stop goroutine and restore metadata for this connection.

edwarnicke commented 2 years ago

@glazychev-art Didn't we discuss adding an expire time to the tag in VPP so we could simply time things out?

glazychev-art commented 2 years ago

@edwarnicke Yes, but:

Tag will be longer (it is limited - https://github.com/FDio/vpp/blob/master/src/vnet/interface.api#L363)
I think it's much easier if we always use maxtTokenLifeTime as expiration time. I don't think this will be a problem...

edwarnicke commented 2 years ago

@glazychev-art I think you are correct, using maxTokenLifeTime should work, it will eventually clean up the connections if they are not renewed.

edwarnicke commented 2 years ago

@glazychev-art Might it be possible to solve this in a more localized fashion.

Right now, understanding of how to handle particular interfaces is 'localized' in a particular mechanism chain element.

What you are suggesting in step 1 would require having a second place that understand all of the mechanisms.

Might it perhaps not be simpler to have each mechanism chain element dump interfaces, keep a map[string][*interfaces.SwInterfaceDetails]. When the mechanism element gets a a connection request, for which it cannot find a swIfIndex in the context, it can simply look in the map for an interface for that connId, and if it finds the connId, delete the entry from the Map, and carry on with that swIfIndex as if it had created it (presuming the interface is of the correct type).

This localizes the issue to a particular mechanism, and so we don't 'smear' localized knowledge into the global sphere.

maxExpireTime after launch, you can just delete anything remaining in the Map.

Thoughts?

edwarnicke commented 2 years ago

One other question... have you given thought to how this would intersect with VNI allocation?

glazychev-art commented 2 years ago

@edwarnicke Very good solution! But in this case, we are missing one of the most important parts - how do we remove expired interfaces from the VPP after maxExpireTime?

VNI... It's good that you pointed it out. Now it is stored in the local map.

glazychev-art commented 2 years ago

So, VNI. There is a problem with 2A case:

If Client VxLAN side was failed - vni.Client doesn't work with the metadata at all - https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/mechanisms/vxlan/vni/client.go#L56-L71
If Server VxLAN side was failed: Case A: We receive the Request from the old client that we want to restore - ~~vni.Server takes VNI from the mechanism~~. We can't use the mechanism, because we clean it before healing - https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/begin/event_factory.go#L100 Case B: We receive the Request from the new client, but we have a VxLAN tunnel, that is not restored yet (we didn't receive the Request from the "old" Client) and the generated VNI matched the existing one - I think we will fail creating a new VxLAN tunnel, retry will happen and we will try again with a new random VNI.

glazychev-art commented 2 years ago

We have also the problem with wireguard - we can't get the wireguard peer index using dump - https://github.com/FDio/vpp/blob/master/src/plugins/wireguard/wireguard.api#L98-L109 We have a logic based on the peer index - for example HERE and HERE

glazychev-art commented 2 years ago

So, for cases where the forwarder has its previous name after a restart and we use an interface dump, we have the following problems:

VxLAN VNI - https://github.com/networkservicemesh/sdk-vpp/issues/512#issuecomment-1057669840
Wireguard peer dump - https://github.com/networkservicemesh/sdk-vpp/issues/512#issuecomment-1059143323 Edited: Wireguard patch was merged - https://gerrit.fd.io/r/c/vpp/+/35624

I guess that, this mostly applies to cases where the container is restarted, not the entire pod.

glazychev-art commented 2 years ago

We know that when users update the forwarder, the pod restarts without saving the name. In this case, the correct solution is to remove the interfaces before terminating.

Moved to a separate issue - https://github.com/networkservicemesh/sdk-vpp/issues/516

glazychev-art commented 2 years ago

@edwarnicke @denis-tingaikin I've prepared the draft PR with an idea of how restoring can be implemented - https://github.com/networkservicemesh/sdk-vpp/pull/542 There, each element is only responsible for dumping its interfaces. Could you have a look?

networkservicemesh / sdk-vpp

Restore inrefaces on restart in case of external VPP #512

Description

Possible solution [DRAFT]