Open ljkiraly opened 1 year ago
Seems that NSM_MAX_TOKEN_LIFETIME default setting is 24h instead of 10 minutes lifetime used in vpp forwarder. I am to repeat the test with the NSM_MAX_TOKEN_LIFETIME=10min setting.
Same result with NSM_MAX_TOKEN_LIFETIME=10min setting.
@ljkiraly Could you also attach logs from the NSC? It could help with investigation.
@denis-tingaikin Just realized that I missed to add the logs to the zip file. I updated the attachment and I adding the nsc logs here as well. BTW seems to me that the forwarder set the interfaces as expected. NSC detects data path failure and reset the connection.
{forwarder log}
Sep 12 16:53:02.609[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(57.1) request-response={"id":"81d3be5c-cad4-4811-a6ef-5c4da4306c30","network_service":"local-forwarder-death","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026534093","name":"local-forw-f280","supportsVlan":"true"}},"context":{"ip_context":{"src_ip_addrs":["172.16.1.1/32"],"dst_ip_addrs":["172.16.1.0/32"],"src_routes":[{"prefix":"172.16.1.0/32"}],"dst_routes":[{"prefix":"172.16.1.1/32"}],"excluded_prefixes":["10.96.0.0/16","10.244.0.0/16"]},"dns_context":{}},"labels":{"nodeName":"kind-worker","podName":"alpine"},"path":{"index":2,"path_segments":[{"name":"alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b","id":"alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvYWxwaW5lIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvbnNtZ3ItZjR2NGgiXSwiZXhwIjoxNjk0NTM4MTgyfQ.oYJibfRxCXQIcNEyvF_lWY-fMjzuBaktrGXvDcNnznI6X4z44MNRCtVd-NHJOGPtaAoxDUghUDYeE_6WcpFtxQ","expires":{"seconds":1694538182,"nanos":388924276}},{"name":"nsmgr-f4v4h","id":"b4b84141-cc9e-4e7b-be7f-cc3b89816e4a","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLWY0djRoIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.l5ssrZ5w4uhRW30WnjD3XAvti8S7TvszBtkw6ogCyltUwNZgX8TgRXQaqgEuBv9Zrv31Kl-8dhLWcnncrvMGfw","expires":{"seconds":1694538182,"nanos":407122067}},{"name":"forwarder-ovs-b972n","id":"81d3be5c-cad4-4811-a6ef-5c4da4306c30","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci1vdnMtYjk3Mm4iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1sb2NhbC1mb3J3YXJkZXItZGVhdGgvcG9kL25zZS1rZXJuZWwtNmQ2NDc3ODc1Zi1ndGdkNyJdLCJleHAiOjE2OTQ1MzgxODJ9.ghEOIYMc0eYbRBxo269dicBL-_40o5vPxIwGQecEAQ5roOqR1FQHvWKQ3SKU3wrGNf_ylkbykrM7nVvR2VRS1g","expires":{"seconds":1694538182,"nanos":482070996}},{"name":"nse-kernel-6d6477875f-gtgd7","id":"f2807884-5a41-4d88-9a42-58347aada4c1","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvbnNlLWtlcm5lbC02ZDY0Nzc4NzVmLWd0Z2Q3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.V7oLhExCJrFkTwZYIG-LASq5vMT4HpMW-szmg2S5HVMJ8Qyww8cRK0_sTtX7bBQz9_uRnWnJw5Pb8TViuRW4jw","expires":{"seconds":1694538182,"nanos":485355075}}]},"network_service_endpoint_name":"nse-kernel-6d6477875f-gtgd7","payload":"ETHERNET"}
Sep 12 16:53:02.611[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(53.1) request-response-diff={"mechanism":{"parameters":{"inodeURL":"file:///proc/799735/fd/16"}}}
Sep 12 16:53:02.649[36m [INFO] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(42.1) /proc/sys/net/ipv4/ping_group_range was set to 0 2147483647
Sep 12 16:53:02.653[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.1/32] [Flags:0] [Gw:<nil>] [Scope:link] [duration:531.547µs] [link.Name:local-forw-f280] [netlink:RouteReplace] [type:networkService] [0m(38.1) completed
Sep 12 16:53:02.654[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.1/32] [Flags:0] [Gw:<nil>] [Scope:universe] [duration:145.383µs] [link.Name:local-forw-f280] [netlink:RouteReplace] [type:networkService] [0m(38.2) completed
Sep 12 16:53:02.798[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(33.2) request-response-diff={"mechanism":{"parameters":{"-supportsVlan":"true","inodeURL":"file:///proc/799735/fd/13","name":"nsm-1"}}}
Sep 12 16:53:02.801[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(24.1) request-diff={"connection":{"mechanism":{"cls":"LOCAL","parameters":{"+inodeURL":"file:///proc/799735/fd/13","+name":"nsm-1"},"type":"KERNEL"},"path":{"path_segments":{"2":{"expires":{"nanos":482070996},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci1vdnMtYjk3Mm4iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1sb2NhbC1mb3J3YXJkZXItZGVhdGgvcG9kL25zZS1rZXJuZWwtNmQ2NDc3ODc1Zi1ndGdkNyJdLCJleHAiOjE2OTQ1MzgxODJ9.ghEOIYMc0eYbRBxo269dicBL-_40o5vPxIwGQecEAQ5roOqR1FQHvWKQ3SKU3wrGNf_ylkbykrM7nVvR2VRS1g"},"3":{"expires":{"nanos":485355075,"seconds":1694538182},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvbnNlLWtlcm5lbC02ZDY0Nzc4NzVmLWd0Z2Q3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.V7oLhExCJrFkTwZYIG-LASq5vMT4HpMW-szmg2S5HVMJ8Qyww8cRK0_sTtX7bBQz9_uRnWnJw5Pb8TViuRW4jw"}}},"state":0},"mechanism_preferences":{"0":{"parameters":{"+inodeURL":"file:///proc/799735/fd/13","+name":"nsm-1","-supportsVlan":"true"}}}}
Sep 12 16:53:02.802[36m [INFO] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(23.1) /proc/sys/net/ipv4/ping_group_range was set to 0 2147483647
Sep 12 16:53:03.819[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.0/32] [Flags:0] [Gw:<nil>] [Scope:link] [duration:127.581µs] [link.Name:nsm-1] [netlink:RouteReplace] [type:networkService] [0m(21.1) completed
Sep 12 16:53:03.819[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.0/32] [Flags:0] [Gw:<nil>] [Scope:universe] [duration:85.979µs] [link.Name:nsm-1] [netlink:RouteReplace] [type:networkService] [0m(21.2) completed
Sep 12 16:53:03.824[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] [0m(11.1) request-response-diff={"mechanism":{"parameters":{"inodeURL":"inode://4/4026532527"}}}
Sep 12 16:51:38.061[31m [ERRO] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [dstIP:172.16.1.0] [srcIP:172.16.1.1] [type:networkService] [0m(7.1) No packets received
Sep 12 16:51:38.061[33m [WARN] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] [0m(7.2) Data plane is down
Sep 12 16:51:38.061[37m [DEBU] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] [0m(7.3) Reconnect with reselect
Sep 12 16:51:38.061[37m [DEBU] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] [0m(7.4) received eventIn: <nil>, err: rpc error: code = Canceled desc = context canceled
@ljkiraly Hello, it looks like there is a problem with closing a connection. I tested it the next way:
I modified NSC to do:
resp, err := nsmClient.Request(ctx, request)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
}
logger.Infof("successfully connected to %v. Response: %v", u.NetworkService(), resp)
time.Sleep(time.Minute)
_, err = nsmClient.Close(ctx, resp)
if err != nil {
logger.Infof("Error while closing a connection to %v. %s", u.NetworkService(), err.Error())
}
time.Sleep(time.Second * 15)
resp, err = nsmClient.Request(ctx, request)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
}
NSC fails to do the second request because there is an error in cmd-forwarder-ovs
:
failed to find link local-forw-de4f: Link not found
If on the second request I use a new request, it passes without any errors on cmd-forwarder-ovs
:
newID := "id-1"
request1 := &networkservice.NetworkServiceRequest{
Connection: &networkservice.Connection{
Id: newID,
NetworkService: u.NetworkService(),
Labels: u.Labels(),
},
MechanismPreferences: []*networkservice.Mechanism{
u.Mechanism(),
},
}
resp1, err := nsmClient.Request(ctx, request1)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
}
Could you please check all Closes in cmd-forwarder-ovs
chain? It looks like cmd-forwarder-ovs
doesn't delete something on Close.
Expected Behavior
NSM should restore after forwarder death.
Current Behavior
Basic setup of NSM was used, containing forwarder-ovs . The connection between NSC and NSE is using local 'kernel' mechanism. After forwarder restart (local or remote) the connection does not restored (checked more then 10 minutes after forwarder restart)
Failure Information
No data path healing is expected, but the generic healing should work.
Steps to Reproduce
Failure Logs
local-forwarder-death.zip