Open zolug opened 4 months ago
Another example where the NSE side interface load-balan-t6nu got stuck.
Jul 5 14:59:33.372 [TRAC] [id:9a5ea784-40f2-4788-b286-c27f28eb9747] [type:networkService] (1.1) request={"connection":{"id":"9a5ea784-40f2-4788-b286-c27f28eb9747","network_service":"load-balancer-a1.trench-a.default","context":{"ip_context":{"src_ip_addrs":["172.16.2.37/24","fd00:0:0:2::21/64"],"dst_ip_addrs":["172.16.2.36/24","fd00:0:0:2::20/64"],"excluded_prefixes":["10.96.0.0/16","10.244.0.0/16"]},"MTU":1500},"labels":{"nodeName":"kind-worker2"},"path":{"index":3,"path_segments":[{"name":"proxy-load-balancer-a1-77zlr","id":"991fe339-7d91-49d4-ba7f-2d4c1d7a01bd-proxy-load-balancer-a1-77zlr-load-balancer-a1.trench-a.default-stateless-lb-frontend-attr-a1-f5864fc94-59nlh","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTcyMDE5MjE3Mn0.UUxJKoSOP-ee0Du3DstHxgecmKYzmpEnieVEBOdZiex8eUOpmA5P-KNr45hxsw-8PccGVUnNWcduISetmnQTNA","expires":{"seconds":1720192172,"nanos":880397227},"metrics":{"client_interface":"KERNEL/unknown"}},{"name":"nsmgr-pfss7","id":"a9211a4f-500d-4d7d-8a3b-ed44a173762e","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNzIwMTkyMTczfQ.9GqG4MtdzWz_F157_V0AF6CiCsl0rZxwNIEtattssADqMzl-An7EGGZV-WPqzZ5NqijLH0uq_-zmp1YZ9ii6nw","expires":{"seconds":1720192173,"nanos":76868291}},{"name":"forwarder-vpp-772b9","id":"789260f9-be2e-401a-b601-40b3a913c067","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNzIwMTkyMTczfQ.KA1890RkZBfrFcYBiytD6HLZp-DctqnJ27yJY-eOVqVE4bjsX-VPQ44Vn5mRnbos85sXxngjec2sxFtoUwbStQ","expires":{"seconds":1720192173,"nanos":183239052}},{"name":"nsmgr-pbj2f","id":"9a5ea784-40f2-4788-b286-c27f28eb9747","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNzIwMTkyMTczfQ.cqMK_9_P-J9tkEkCATe0-lZgej5FO7OJg-918c_MRaj70aB1SFK86sWvb1VQW_5fbbpFxfl9jjUA-rkrF2svOQ","expires":{"seconds":1720192173,"nanos":184038754}}]},"network_service_endpoint_name":"stateless-lb-frontend-attr-a1-f5864fc94-59nlh","payload":"ETHERNET"},"mechanism_preferences":[{"cls":"REMOTE","type":"VXLAN","parameters":{"MTU":"1446","src_ip":"172.18.0.6","src_port":"4789"}},{"cls":"REMOTE","type":"VLAN"}]}
filter string: load-balan-t6nu|667aa022-1a1a-49ec-a938-37b9dddb2da6|9a5ea784-40f2-4788-b286-c27f28eb9747
(same forwarder logs)
I was able to reproduce it on a 4 node KIND cluster relying on basic deployment with kernel-nse and kernel-nsc by tweaking NSM_REQUEST_TIMEOUT parameter.
Double checked that in nsc the requests kept failing due to context timeout, and left the system running for a couple minutes and was monitoring forwarder logs for "vppapi TapCreateV3 returned error" printouts.
cat <<EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nsc-kernel
labels:
app: nsc-kernel
spec:
selector:
matchLabels:
app: nsc-kernel
template:
metadata:
labels:
app: nsc-kernel
"spiffe.io/spiffe-id": "true"
spec:
initContainers:
- name: sysctl-init
image: registry.nordix.org/cloud-native/meridio/busybox:1.29
securityContext:
privileged: true
command: ["/bin/sh"]
args:
- -c
- sysctl -w net.ipv4.ping_group_range='0 1' ;
containers:
- name: nsc
image: ghcr.io/networkservicemesh/cmd-nsc:v1.13.2-rc.1
imagePullPolicy: Always
env:
- name: SPIFFE_ENDPOINT_SOCKET
value: unix:///run/spire/sockets/agent.sock
- name: NSM_LOG_LEVEL
value: TRACE
- name: NSM_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NSM_NETWORK_SERVICES
value: kernel://icmp-responder
- name: NSM_LIVENESSCHECKENABLED
value: "false"
- name: NSM_REQUEST_TIMEOUT
value: "180ms"
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
readOnly: true
- name: nsm-socket
mountPath: /var/lib/networkservicemesh
readOnly: true
resources:
requests:
cpu: 100m
memory: 40Mi
limits:
memory: 80Mi
cpu: 200m
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: Directory
- name: nsm-socket
hostPath:
path: /var/lib/networkservicemesh
type: DirectoryOrCreate
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nse-kernel
labels:
app: nse-kernel
spec:
selector:
matchLabels:
app: nse-kernel
template:
metadata:
labels:
app: nse-kernel
"spiffe.io/spiffe-id": "true"
spec:
containers:
- name: nse
image: ghcr.io/networkservicemesh/cmd-nse-icmp-responder:v1.13.2-rc.1
imagePullPolicy: IfNotPresent
env:
- name: SPIFFE_ENDPOINT_SOCKET
value: unix:///run/spire/sockets/agent.sock
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NSM_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NSM_LOG_LEVEL
value: TRACE
- name: NSM_CONNECT_TO
value: unix:///var/lib/networkservicemesh/nsm.io.sock
- name: NSM_CIDR_PREFIX
value: 172.16.0.0/16
- name: NSM_SERVICE_NAMES
value: "icmp-responder"
volumeMounts:
- name: spire-agent-socket
mountPath: /run/spire/sockets
readOnly: true
- name: nsm-socket
mountPath: /var/lib/networkservicemesh
readOnly: true
resources:
requests:
cpu: 100m
memory: 40Mi
limits:
memory: 80Mi
cpu: 200m
volumes:
- name: spire-agent-socket
hostPath:
path: /run/spire/sockets
type: Directory
- name: nsm-socket
hostPath:
path: /var/lib/networkservicemesh
type: DirectoryOrCreate
EOF
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nsc-kernel-7cc77f58c8-ftl7f 1/1 Running 0 30m 10.244.1.12 kind-worker2 <none> <none>
nse-kernel-85fbcffb44-ghz5s 1/1 Running 0 30m 10.244.2.9 kind-worker <none> <none>
kubectl get pods -l app=nsc-kernel -o=jsonpath='{.items[*].metadata.name}'| tr ' ' '\n' | xargs -I{} sh -c 'echo === {} === && kubectl exec -t {} -- ip a';kubectl get pods -l app=nse-kernel -o=jsonpath='{.items[*].metadata.name}'| tr ' ' '\n' | xargs -I{} sh -c 'echo === {} === && kubectl exec -t {} -- ip a'
=== nsc-kernel-7cc77f58c8-ftl7f ===
Defaulted container "nsc" out of: nsc, sysctl-init (init)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 1e:5f:01:00:e9:70 brd ff:ff:ff:ff:ff:ff
inet 10.244.1.12/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::1c5f:1ff:fe00:e970/64 scope link
valid_lft forever preferred_lft forever
=== nse-kernel-85fbcffb44-ghz5s ===
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@icmp-respo-Rw5D: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 9a:ea:0c:ea:b3:2e brd ff:ff:ff:ff:ff:ff
inet 10.244.2.9/24 brd 10.244.2.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::98ea:cff:feea:b32e/64 scope link
valid_lft forever preferred_lft forever
3: icmp-respo-CTW4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:03:96:b0:a1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:3ff:fe96:b0a1/64 scope link
valid_lft forever preferred_lft forever
4: icmp-respo-SSwm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:5b:6a:f2:04 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:5bff:fe6a:f204/64 scope link
valid_lft forever preferred_lft forever
5: icmp-respo-ePzm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:11:74:cb:fa brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:11ff:fe74:cbfa/64 scope link
valid_lft forever preferred_lft forever
6: icmp-respo-yJZR: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:6b:54:49:71 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:6bff:fe54:4971/64 scope link
valid_lft forever preferred_lft forever
7: icmp-respo-QShs: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:57:ae:f2:13 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:57ff:feae:f213/64 scope link
valid_lft forever preferred_lft forever
8: icmp-respo-rFaQ: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:19:d3:7a:09 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:19ff:fed3:7a09/64 scope link
valid_lft forever preferred_lft forever
9: icmp-respo-rg7S: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:f3:44:29:86 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:f3ff:fe44:2986/64 scope link
valid_lft forever preferred_lft forever
10: icmp-respo-A6cp: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:fb:2f:3e:1a brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:fbff:fe2f:3e1a/64 scope link
valid_lft forever preferred_lft forever
11: icmp-respo-Rw5D: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:01:11:de:76 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:1ff:fe11:de76/64 scope link
valid_lft forever preferred_lft forever
12: icmp-respo-r4Db: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:97:4a:57:2f brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:97ff:fe4a:572f/64 scope link
valid_lft forever preferred_lft forever
13: icmp-respo-KvCJ: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:29:d6:43:50 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:29ff:fed6:4350/64 scope link
valid_lft forever preferred_lft forever
14: icmp-respo-z0kO: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:3d:95:f7:58 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:3dff:fe95:f758/64 scope link
valid_lft forever preferred_lft forever
15: icmp-respo-V3EG: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:bd:60:0b:8e brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:bdff:fe60:b8e/64 scope link
valid_lft forever preferred_lft forever
16: icmp-respo-CRvI: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:21:ad:cc:f0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:21ff:fead:ccf0/64 scope link
valid_lft forever preferred_lft forever
17: icmp-respo-ipuI: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:f9:e7:d9:db brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:f9ff:fee7:d9db/64 scope link
valid_lft forever preferred_lft forever
18: icmp-respo-3zFH: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:b9:f8:db:d7 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:b9ff:fef8:dbd7/64 scope link
valid_lft forever preferred_lft forever
19: icmp-respo-sDW0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:e7:d7:c3:17 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:e7ff:fed7:c317/64 scope link
valid_lft forever preferred_lft forever
20: icmp-respo-GLAw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:49:d9:c5:99 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:49ff:fed9:c599/64 scope link
valid_lft forever preferred_lft forever
21: icmp-respo-shCm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:2f:2f:e2:d3 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:2fff:fe2f:e2d3/64 scope link
valid_lft forever preferred_lft forever
22: icmp-respo-erCg: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:d1:20:9a:a2 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:d1ff:fe20:9aa2/64 scope link
valid_lft forever preferred_lft forever
23: icmp-respo-AztN: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:e7:69:12:57 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:e7ff:fe69:1257/64 scope link
valid_lft forever preferred_lft forever
24: icmp-respo-u8VJ: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:f1:49:85:66 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:f1ff:fe49:8566/64 scope link
valid_lft forever preferred_lft forever
25: icmp-respo-xviy: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:b1:fa:1b:ca brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:b1ff:fefa:1bca/64 scope link
valid_lft forever preferred_lft forever
26: icmp-respo-MCO7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:19:b2:6d:09 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:19ff:feb2:6d09/64 scope link
valid_lft forever preferred_lft forever
27: icmp-respo-wqOy: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:9f:8f:eb:19 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:9fff:fe8f:eb19/64 scope link
valid_lft forever preferred_lft forever
28: icmp-respo-DalE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:3f:d4:d8:b1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:3fff:fed4:d8b1/64 scope link
valid_lft forever preferred_lft forever
29: icmp-respo-iryd: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:73:72:8d:3b brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:73ff:fe72:8d3b/64 scope link
valid_lft forever preferred_lft forever
30: icmp-respo-GNxN: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:b7:97:05:0d brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:b7ff:fe97:50d/64 scope link
valid_lft forever preferred_lft forever
31: icmp-respo-c957: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:6b:05:d8:ad brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:6bff:fe05:d8ad/64 scope link
valid_lft forever preferred_lft forever
32: icmp-respo-uxJE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:7f:55:07:47 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:7fff:fe55:747/64 scope link
valid_lft forever preferred_lft forever
33: icmp-respo-jXu5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:3f:85:51:95 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:3fff:fe85:5195/64 scope link
valid_lft forever preferred_lft forever
34: icmp-respo-uX9z: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:ed:17:52:30 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:edff:fe17:5230/64 scope link
valid_lft forever preferred_lft forever
35: icmp-respo-ctZo: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:79:82:1b:3f brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:79ff:fe82:1b3f/64 scope link
valid_lft forever preferred_lft forever
36: icmp-respo-x7yM: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:47:c8:fe:7b brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:47ff:fec8:fe7b/64 scope link
valid_lft forever preferred_lft forever
37: icmp-respo-bW5f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:1b:99:19:32 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:1bff:fe99:1932/64 scope link
valid_lft forever preferred_lft forever
38: icmp-respo-oO3g: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:f7:81:66:7a brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:f7ff:fe81:667a/64 scope link
valid_lft forever preferred_lft forever
39: icmp-respo-NPQR: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:01:07:78:69 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:1ff:fe07:7869/64 scope link
valid_lft forever preferred_lft forever
40: icmp-respo-qrmC: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:17:89:31:f0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:17ff:fe89:31f0/64 scope link
valid_lft forever preferred_lft forever
41: icmp-respo-CRCU: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:4b:51:7a:13 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:4bff:fe51:7a13/64 scope link
valid_lft forever preferred_lft forever
42: icmp-respo-GdjO: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:43:56:de:1a brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:43ff:fe56:de1a/64 scope link
valid_lft forever preferred_lft forever
43: icmp-respo-uNRG: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:bf:22:a5:e5 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:bfff:fe22:a5e5/64 scope link
valid_lft forever preferred_lft forever
44: icmp-respo-U7wo: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:c3:80:c3:b1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:c3ff:fe80:c3b1/64 scope link
valid_lft forever preferred_lft forever
45: icmp-respo-k3p8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 02:fe:ad:ee:8c:db brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:adff:feee:8cdb/64 scope link tentative
valid_lft forever preferred_lft forever
I checked two ideas to handle the interface cleanup: 1/ fetch the interfaces by SwInterfaceDump govpp API call and compare the interface index lists before and after the TAP interface creation. When creation fails by context cancel error remove the added interfaces (diff of the swIfIndex list) - this approach can impact other timing problems. 2/ try to remove the interface by name in Close. This should work if we generate and store the names of TAP interfaces in NSM.
Other possibilities to check:
I`m not in favor of option 1, since IMHO parallel actions could alter the interface index list in the meantime.
What if create would employ an independent (background) context to execute the VPP API call TapCreateV3? A hard deadline of let's say 10 seconds could be also set just in case. Otherwise, IMHO the API call itself is not costly. It might be mostly bad luck and timing (input context is close to its expiration) why it might fail due to context expired. An independent context would significantly lower the probability of leaked interfaces due to context timeout. (Considering that a proper solution seems to be rather complex to implement, this might be a good enough fix for the time.)
The problem should be resolved in v1.14.0-rc.3. Could you please chec kit?
Hi,
Regarding this one: https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/1133#issuecomment-2217495511 Using NSM v1.14.0-rc.3 did not resolve the problem. I still saw numerous excess interfaces in the nsc and nse in no time. Moreover forwarder collocated with nsc or nse encountered frequent OOM kills (memory limit and request both set to 500Mi). (The nse got OOM killed as well occasionally.) It did not matter whether nsc and nse were using sdk 1.13.2 or 1.14.0-rc.3.
Also, run a test with NSM 1.14.0-rc.4 which looked much better than rc.3, but there were still excess intefaces in the NSE but way less than with rc.3. Couldn't spot forwarder restarts due to OOM either.
Hi, @zolug. Could you please also test v1.14.0-rc.2
?
Hi, @zolug. Could you please also test
v1.14.0-rc.2
?
Hi,
In case of https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/1133#issuecomment-2217495511: Some interfaces also piled up when running with v1.14.0-rc.2 using a small enough request timeout in the NSC. However, in my tests the NSC did manage to establish a connection eventually. Then, after 10 minutes the excess interfaces were gone from both the NSC and NSE. Btw, if the NSC fails to establish a connection anytime soon (mostly happens when I only have 1 registry POD instance), then there's a huge steep jump in memory in case of forwarder-vpp and registry-k8s.
When running Meridio where the LB NSE was removed by a script continuously, thus was only available intermittently for a short period of time the excess interfaces were not cleaned-up after stopping the script that interfered with the availability of the NSE.
while [ 1 ]; do kubectl delete nses -n nsm stateless-lb-frontend-attr-a1-f5864fc94-9r7mb; sleep 1; done
What I observed in the second case was, that despite the request context's timeout/cancel the begin let the "connect" to continue. Thus from the perspective of nsmgr and forwarder a connection including interfaces was established. Thus the NSC upon a subsequent try/request was able to recover a connection via Monitor Connection via nsmgr. But as soon as part of one of the "reconnect" the nsmgr got failure response from the forwarder due to the NSE not being around, the connection state was changed to Down. Which in the NSC triggered a reconnect with reselect instead an ordinary request, promting the nsmgr to close the "old connection first" Now, this also failed in my case despite the 1 minute timeout in the beginServer (due to NSE not being found), yet the cleanup could not remove the interface because the context was already expired.
Sep 17 12:01:29.524 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (9) ⎆ sdk/pkg/networkservice/common/mechanisms/recvfd/recvFDServer.Close()
...
ep 17 12:02:29.494 [ERRO] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (2.2) context deadline exceeded; Error returned from sdk/pkg/networkservice/common/begin/
beginServer.Close; github.com/networkservicemesh/sdk/pkg/networkservice/core/trace/traceverbose.logError; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/
pkg/networkservice/core/trace/traceverbose/common.go:211; github.com/networkservicemesh/sdk/pkg/networkservice/core/trace/traceverbose.(*beginTraceServer).Close; /root/go
/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/trace/traceverbose/server.go:71; github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(
*nextServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/next/server.go:65; github.com/networkservicemesh/sdk/pkg/ne
tworkservice/core/trace.(*traceServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/trace/server.go:54; github.com/netwo
rkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/next/server.go:6
5; github.com/networkservicemesh/sdk/pkg/networkservice/core/trace/traceverbose.(*endTraceServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/
pkg/networkservice/core/trace/traceverbose/server.go:86; github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close; /root/go/pkg/mod/github.
com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/next/server.go:65; github.com/networkservicemesh/sdk/pkg/networkservice/common/updatepath.(*updatePathServer).Close
; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/common/updatepath/server.go:72; github.com/networkservicemesh/sdk/pkg/networkser
vice/core/trace/traceverbose.(*beginTraceServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/trace/traceverbose/server.go:69;
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservi
ce/core/next/server.go:65; github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*traceServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14
.0-rc.2/pkg/networkservice/core/trace/server.go:54; github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/next/server.go:65; github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close; /root/go/pkg/mod/github.com/networkservicemesh/sdk@v1.14.0-rc.2/pkg/networkservice/core/next/server.go:65; github.com/networkservicemesh/api/pkg/api/networkservice._NetworkService_Close_Handler; /root/go/pkg/mod/github.com/networkservicemesh/api@v1.14.0-rc.2/pkg/api/networkservice/networkservice.pb.go:295; google.golang.org/grpc.(*Server).processUnaryRPC; /root/go/pkg/mod/google.golang.org/grpc@v1.60.1/server.go:1372; google.golang.org/grpc.(*Server).handleStream; /root/go/pkg/mod/google.golang.org/grpc@v1.60.1/server.go:1783; google.golang.org/grpc.(*Server).serveStreams.func2.1; /root/go/pkg/mod/google.golang.org/grpc@v1.60.1/server.go:1016; runtime.goexit; /go/src/runtime/asm_amd64.s:1598;
Sep 17 12:02:29.495 [TRAC] [id:d18b3ea8-7849-4697-a0b9-1bba3fe8b09c] [type:networkService] (1.3) close-response={"id":"3de28dd1-3e5c-4196-8b04-98fbb2ed795a","network_service":"load-balancer-a1.trench-a.default","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026536324","name":"load-balan-HHUW"}},"context":{"ip_context":{"src_ip_addrs":["172.16.2.3/24","fd00:0:0:2::3/64"],"dst_ip_addrs":["172.16.2.2/24","fd00:0:0:2::2/64"],"excluded_prefixes":["10.96.0.0/16","10.244.0.0/16"]},"MTU":1446},"labels":{"nodeName":"kind-worker3"},"path":{"index":2,"path_segments":[{"name":"proxy-load-balancer-a1-zl9cl","id":"proxy-load-balancer-a1-zl9cl-load-balancer-a1.trench-a.default-stateless-lb-frontend-attr-a1-f5864fc94-9r7mb","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTcyNjU3NDkzOX0.1mWNaxuEXL7IBG5yB2CfUbxLp1pc8Trg7inRXVZS0tSQ0JZdpdo2w0GUpXwhLbK_Rq7UqZG56PWm0n9bFGgnuQ","expires":{"seconds":1726574939,"nanos":176004176},"metrics":{"client_interface":"KERNEL/unknown"}},{"name":"nsmgr-2qljn","id":"d18b3ea8-7849-4697-a0b9-1bba3fe8b09c","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJleHAiOjE3MjY1NzUwODl9.DzQ4fxJq4yvbKqYEn4iPd-RhQSnd32H-gvyenlXIErrrkis5CJneRuvSEXhsbk5QWeshF3hzH9oy7RKLeTistQ","expires":{"seconds":1726575089,"nanos":494725910}},{"name":"forwarder-vpp-8wjlw","id":"3de28dd1-3e5c-4196-8b04-98fbb2ed795a","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNzI2NTc0OTM5fQ.aXRYktq-HHZITWk-184u0sde9WW0Sr_KKMmyxc_lCgm4BwK4ocUdRz6o57ftPOkaq423rJPcCcaa5pXLKml1IQ","expires":{"seconds":1726574939,"nanos":311850181},"metrics":{"client_drops":"2","client_interface":"VXLAN/vxlan_tunnel3","client_rx_bytes":"450","client_rx_packets":"3","client_tx_bytes":"866","client_tx_packets":"5","server_drops":"0","server_interface":"VIRTIO/tap0","server_rx_bytes":"686","server_rx_packets":"5","server_tx_bytes":"150","server_tx_packets":"1"}},{"name":"nsmgr-l8t2d","id":"ec2047bf-40b3-434f-9801-10ff48ba402e","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNzI2NTc0OTM5fQ.m0_Y7J1STG00924kcdT19U4-W0BW7gXu4oFrOqcZezY0KICUyj9uX1Zt4AEZUP_WRLyVbAEI57yZbcrZFhcsAQ","expires":{"seconds":1726574939,"nanos":343117634}},{"name":"forwarder-vpp-wjhj7","id":"39f6dab6-30d7-414a-88cb-b7311437e505","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tZmVzIl0sImV4cCI6MTcyNjU3NDkzOX0.TFBMOqE0L35zFKH1YJAa6v7D9boic6TCPGpZz1-NL8ZvWA77sctIAIn1JKDk1oVWsmviFMQIQJrLrpyVy2L_zw","expires":{"seconds":1726574939,"nanos":431882120},"metrics":{"client_drops":"2","client_interface":"VIRTIO/tap2","client_rx_bytes":"796","client_rx_packets":"6","client_tx_bytes":"686","client_tx_packets":"5","server_drops":"0","server_interface":"VXLAN/vxlan_tunnel23","server_rx_bytes":"686","server_rx_packets":"5","server_tx_bytes":"680","server_tx_packets":"4"}},{"name":"stateless-lb-frontend-attr-a1-f5864fc94-9r7mb","id":"116d4bd9-42d1-4345-b6b1-9b898fe248a5","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tZmVzIiwiYXVkIjpbInNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS9zYS9kZWZhdWx0Il0sImV4cCI6MTcyNjU3NDkzOX0._A5CN3H7803hxNbgzxygVlZDLnqMjtwf67_9U_BrX1-7HlGMgSgLDBYhEU9aDVk7-_G-DkDFt_ucWws_uRWgug","expires":{"seconds":1726574939,"nanos":432567121},"metrics":{"server_interface":"KERNEL/load-balan-ahl6"}}]},"network_service_endpoint_name":"stateless-lb-frontend-attr-a1-f5864fc94-9r7mb","payload":"ETHERNET"}
Sep 17 12:02:29.495 [ERRO] [id:d18b3ea8-7849-4697-a0b9-1bba3fe8b09c] [type:networkService] (1.4) Error returned from sdk/pkg/networkservice/common/begin/beginServer.Close: context deadline exceeded
Sep 17 12:02:29.497 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (10) ⎆ sdk/pkg/networkservice/common/mechanisms/sendfd/sendFDServer.Close()
Sep 17 12:02:29.497 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (11) ⎆ sdk/pkg/networkservice/common/discover/discoverCandidatesServer.Close()
...
Sep 17 12:02:29.517 [ERRO] [type:registry] (25.1) Error returned from sdk/pkg/registry/common/connect/connectNSEClient.Recv: rpc error: code = DeadlineExceede
d desc = context deadline exceeded
Sep 17 12:02:29.517 [ERRO] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [discoverCandidatesServer:Close] [type:networkService] (11.1) endpoint is not found: stateless-lb-front
end-attr-a1-f5864fc94-9r7mb: network service endpoint stateless-lb-frontend-attr-a1-f5864fc94-9r7mb not found
Sep 17 12:02:29.517 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (12) ⎆ sdk/pkg/networkservice/common/roundrobin/selectEndpointServer.Close()
Sep 17 12:02:29.517 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (13) ⎆ sdk/pkg/networkservice/core/next/nextServer.Close()
Sep 17 12:02:29.517 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (14) ⎆ sdk-vpp/pkg/networkservice/metrics/stats/statsServer.Close()
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (15) ⎆ sdk-vpp/pkg/networkservice/metrics/ifacename/ifaceNamesServer.Close()
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (16) ⎆ sdk-vpp/pkg/networkservice/up/upServer.Close()
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (17) ⎆ sdk/pkg/networkservice/core/next/nextServer.Close()
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (18) ⎆ sdk-vpp/pkg/networkservice/xconnect/l2xconnect/l2XconnectServer.Close
()
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (19) ⎆ sdk-vpp/pkg/networkservice/xconnect/l3xconnect/l3XconnectServer.Close()
time="2024-09-17T12:02:29Z" level=error msg="Channel ID not known, ignoring the message." channel=404 msg_id=851
Sep 17 12:02:29.518 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (20) ⎆ sdk-vpp/pkg/networkservice/xconnect/l2bridgedomain/l2BridgeDomainServer.Close()
Sep 17 12:02:29.519 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (21) ⎆ sdk/pkg/networkservice/core/next/nextServer.Close()
Sep 17 12:02:29.519 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (22) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/mtu/mtuServer.Close()
Sep 17 12:02:29.519 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (23) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/ipcontext/ipneighbors/ipNeighborsServer.Close()
Sep 17 12:02:29.519 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (24) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/ipcontext/iprule/ipruleServer.Close()
Sep 17 12:02:29.520 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (25) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/ipcontext/routes/routesServer.Close()
Sep 17 12:02:29.520 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (26) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/ipcontext/ipaddress/ipaddressServer.Close()
Sep 17 12:02:29.520 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (27) ⎆ sdk-kernel/pkg/kernel/networkservice/connectioncontextkernel/pinggrouprange/pinggrouprangeServer.Close()
Sep 17 12:02:29.520 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (28) ⎆ sdk-kernel/pkg/kernel/networkservice/ethernetcontext/vfEthernetContextServer.Close()
Sep 17 12:02:29.520 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (29) ⎆ sdk-vpp/pkg/networkservice/tag/tagServer.Close()
Sep 17 12:02:29.521 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (30) ⎆ sdk-vpp/pkg/networkservice/connectioncontext/mtu/mtuServer.Close()
Sep 17 12:02:29.521 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (31) ⎆ sdk/pkg/networkservice/common/mechanisms/mechanismsServer.Close()
Sep 17 12:02:29.521 [TRAC] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (32) ⎆ sdk-vpp/pkg/networkservice/mechanisms/kernel/kerneltap/kernelTapServer.Close()
Sep 17 12:02:29.521 [ERRO] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] (32.1) unable to delete connection with SwIfIndex 11: context deadline exceeded
...
IMHO, it would be preferable to have separate context for tap cleanup (and probably for create as well). Same applies to the vxlan conn:
Sep 17 12:02:29.530 [ERRO] [id:3de28dd1-3e5c-4196-8b04-98fbb2ed795a] [type:networkService] [vxlan:client] (68.1) error while deleting vxlan connection: vppapi AddNodeNext returned error: context deadline exceeded
Hi, @zolug! The problem is related to small context timeout before any create
call in sdk-vpp
. We've decided to use separate contexts with good timeouts for all create
operations.
@Ex4amp1e Could you please share the current technical solution and decomposition to cover the issue? Thanks.
@denis-tingaikin Currently we have tried some draft solution to override contexts, where vpp operations are being used - https://github.com/networkservicemesh/sdk-vpp/pull/867. But we decided to use another solution: to override vpp connection instead of overriding create/delete function contexts in general. It will help to cover and make safe all vpp operations.
Decomposition and plan:
Interface can be leaked if the initial Request's context gets cancelled before the connection could be established.
The problem might occur in kernelTap if the context gets cancelled during the vppConn tapCreate call. In which case the call returns with an error, hence the interface won't get saved into the ifindex storage.
Therefore, even though the Request should initiate a Close due to the failed kernelTap create (e.g. in kernelTapServer), del won't be able to lookup the interface from the storage.
NSM version: v1.13.2-rc.1
Logs:
Follow the following request:
Use filter string: load-balan-2ezO|9a29a6db-aed2-48d4-b5fa-c75f7337328e|95b833d9-10fc-43d4-9c30-54f2ad7f1c2b|load-balan-tZer
NSC side load-balan-2ezO interface will not be removed. nsm_forwarder-vpp-mwbc2_forwarder-vpp.log