Open pperiyasamy opened 3 years ago
I feel we need to test this scenario with forwarder vpp and verify is Close called.
@pperiyasamy Could you check this on forwarder vpp? Note: forwarder vpp is also supported veth pair
@denis-tingaikin Whenever I delete only NSC pod, then Close is always called on both local and remote forwarder, there is no problem here. but when NSC and NSE pods are deleted together, then Close at Endpoint forwarder side is not invoked. is there a chance that sdk chain element would return prematurely without invoking interpose endpoint's close when actual endpoint service (icmp-responder) not found ?
Ah, I got it. For this kind scenario timeout
chain element will close resources by tokens expiration.
@denis-tingaikin yes, upon timeout (after 5 mins) close was invoked on the forwarder, but connect server again failed with error no client found for the connection
, looks it happened at here, this causes stale n/w entries forever on that client connection. any idea why connInfo
wasn't present when close (I'm sure this was a first close invocation) is invoked ? Looks this issue is not reproducible for every attempt.
can we make NSMgr to invoke close on the endpoint side forwarder even when endpoint service is gone ?
I think it is an issue of connect
cache. Currently @Bolodya1997 working on removing cache from connect Server. So I'm mostly sure this problem will be solved in https://github.com/networkservicemesh/sdk/pull/1069
In such scenario I expect the following behavior for the NSM:
Client -> L NSMgr -> L Forwarder -> L NSMgr -> R NSMgr -> R Forwarder -> R NSMgr -> Endpoint
context canceled
or client connection is closing
error in R NSMgr logs.no client found
error in R Forwarder logs.no client found
error in R NSMgr logs.no client found
error in L NSMgr logs.client connection is closing
error returned to Client.In some place [2] meets [4] and it results in stopping both [2, 4] and most probably returning no client found
error to Client.
Since Close was not successful for any part of chain, timeout will happen for all connections and will end with no client found
error.
As a result, any part of the chain should be closed at least 2 times - first with heal/Close, second with timeout.
@pperiyasamy
I can see that in the logs you have provided L Forwarder receives client connection is closing
error. Who originates this error?
Also can you please share how is it the whole case differs from the behavior that I expect.
@Bolodya1997 There were no logs at R Forwarder
at the time of endpoint/client deletion, this made me to think L NSMgr -> R NSMgr -> R Forwarder
call didn't happen. saw logs only at L Forwarder
side which I've pasted already.
upon timeout, there are logs at R Forwarder
for this connection but that returned prematurely at connect server with error no client found for the connection
, due to this no entries (for connecting endpoint) for that particular client connection are cleaned up as none of client chain element's close are invoked.
I would attach more logs when I get into this issue again.
Expected Behavior
The NSC and NSE are running in different worker nodes which are connected through
veth pair <-> vxlan tunnel <-> veth pair
vWire connection with this example using ovs forwarder.when
kubectl delete -k .
is run, all of the network resources created by ovs forwarder should be cleaned up at both sides.Current Behavior
On the endpoint side, there are no resources cleaned up and forwarder chain element's Close handler not invoked. But the network resources are cleaned up properly on client side. This seems to be because of endpoint pod is deleted first followed by client pod.
Failure Information (for bugs)
Here is the logs at the client side forwarder.
Steps to Reproduce
kubectl delete -k .
which would delete both endpoint and client together.Context
Failure Logs
No relevant logs generated at this time for endpoint side forwarder.
Is this an issue at sdk side ? Please feel free to reject it when if this is not a bug.