networkservicemesh / sdk-vpp

Apache License 2.0
2 stars 18 forks source link

Context deadline exceeded during vppapi call #802

Open glazychev-art opened 7 months ago

glazychev-art commented 7 months ago

Description

Let's assume that the request timeout exceeded during some vppapi call. And at this time the Request was processed by VPP, but in our application govpp returns a context timeout error. As a result, we have leaked vpp resources.

For example:

  1. Create a tap interface with almost expired context: https://github.com/networkservicemesh/sdk-vpp/blob/6cdac61dedd8d9be37485ed3650624ebe0fde77f/pkg/networkservice/mechanisms/kernel/kerneltap/common.go#L65-L85
  2. It returns an error with context deadline exceeded
  3. But, based on &tapv2.TapCreateV2{...}, VPP creates a corresponding interface in the client namespace with a specific name.
  4. Since we received an error on the line 84, we cannot control this interface in the application. We can't delete it, for example.
  5. We have a resource leak.

Logs:

...
Feb  6 11:27:50.854 [ERRO] [id:34655f24-cc23-4a94-84c3-94a0e317d25d] [type:networkService] (32.3)                                  vppapi TapCreateV2 returned error: context deadline exceeded
...
vpp# show int addr
tap6 (up):
  L2 xconnect vxlan_tunnel2
tap7 (dn):
tap8 (up):
  L2 xconnect vxlan_tunnel3

Possible solutions

  1. Make vppapi calls atomic. For example, implement transactions
  2. Check the remaining context timeout before each vppapi call https://github.com/networkservicemesh/vpphelper/blob/e2b961f768b67dfe0687f5aa90696ffdeffba203/connection.go#L100
  3. Create a chain element, that will check the remaining context timeout for the each endpoint (NSMgr, Forwarder, NSE...)