networkservicemesh / cmd-forwarder-vpp

Apache License 2.0
2 stars 22 forks source link

VPP forwarder fails with panic #122

Open Bolodya1997 opened 3 years ago

Bolodya1997 commented 3 years ago

Env description

Packet automation cluster.

Steps to reproduce

  1. Edit memory tests to use single NSM setup for running multiple times.
  2. Run tests multiple times.

Actual result

VPP forwarder restarts with panic:

Apr  6 04:08:40.826 [INFO] [name:forwarder-vpp-bpjs6] (32.1)                                  request-diff={"mechanism_preferences":{"2":{"parameters":{"+src_ip":"192.168.0.4"}}}}
Apr  6 04:08:40.826 [INFO] [name:forwarder-vpp-bpjs6] (33)                                 ⎆ sdk/pkg/networkservice/common/mechanisms/recvfd/recvFDClient.Request()
Apr  6 04:08:40.826 [INFO] [name:forwarder-vpp-bpjs6] (34)                                  ⎆ sdk/pkg/networkservice/common/mechanisms/sendfd/sendFDClient.Request()
Apr  6 04:08:40.826 [INFO] [name:forwarder-vpp-bpjs6] (35)                                   ⎆ sdk/pkg/networkservice/common/null/nullClient.Request()
Apr  6 04:08:40.827 [INFO] [name:forwarder-vpp-bpjs6] (36)                                    ⎆ api/pkg/api/networkservice/networkServiceClient.Request()
Apr  6 04:08:40.897 [ERRO] [name:forwarder-vpp-bpjs6] (36.1)                                      rpc error: code = Canceled desc = context canceled
Error returned from api/pkg/api/networkservice/networkServiceClient.Request
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:62
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/common/null.(*nullClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/common/null/client.go:43
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:58
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/common/mechanisms/sendfd.(*sendFDClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/common/mechanisms/sendfd/client.go:54
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:58
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/common/mechanisms/recvfd.(*recvFDClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/common/mechanisms/recvfd/client.go:55
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:58
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk-vpp/pkg/networkservice/mechanisms/vxlan/vni.(*vniClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk-vpp@v0.0.0-20210405063250-b9cbf6b70e98/pkg/networkservice/mechanisms/vxlan/vni/client.go:48
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:58
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:69
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/trace/client.go:99
github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/core/next/client.go:71
github.com/networkservicemesh/sdk-vpp/pkg/networkservice/mechanisms/vxlan.(*vxlanClient).Request
    /root/go/pkg/mod/github.com/networkservicemesh/sdk-vpp@v0.0.0-20210405063250-b9cbf6b70e98/pkg/networkservice/mechanisms/vxlan/client.go:65
Apr  6 04:08:40.897 [INFO] [name:forwarder-vpp-bpjs6] (35.1)                                     response=null
Apr  6 04:08:40.901 [ERRO] [name:forwarder-vpp-bpjs6] (35.2)                                     Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Canceled desc = context canceled
Apr  6 04:08:40.904 [ERRO] [name:forwarder-vpp-bpjs6] (34.1)                                    Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Canceled desc = context canceled
Apr  6 04:08:40.907 [ERRO] [name:forwarder-vpp-bpjs6] (33.1)                                   Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Canceled desc = context canceled
Apr  6 04:08:40.910 [ERRO] [name:forwarder-vpp-bpjs6] (32.2)                                  Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Canceled desc = context canceled
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xad36b0]
goroutine 177 [running]:
os.(*File).Name(...)
    /go/src/os/file.go:54
github.com/networkservicemesh/sdk/pkg/networkservice/common/mechanisms/recvfd.recvFDAndSwapInodeToFile.func1()
    /root/go/pkg/mod/github.com/networkservicemesh/sdk@v0.0.0-20210405062928-0d2dc281b97c/pkg/networkservice/common/mechanisms/recvfd/common.go:76 +0xf0
github.com/edwarnicke/serialize.(*Executor).process(0xc000511a00, 0xc000722980)
    /root/go/pkg/mod/github.com/edwarnicke/serialize@v1.0.7/serialize.go:68 +0x3f
created by github.com/edwarnicke/serialize.(*Executor).AsyncExec
    /root/go/pkg/mod/github.com/edwarnicke/serialize@v1.0.7/serialize.go:56 +0x1f8

Expected result

No restart, no panic.

Bolodya1997 commented 3 years ago

Looks like the cause for this issue is the following: https://github.com/edwarnicke/grpcfd/blob/10fb469a6976e4d488b7a2a66f2a5d09c0d4e43c/connwrap.go#L230-L235

So it results in just closing the file channel, without sending anything: https://github.com/edwarnicke/grpcfd/blob/10fb469a6976e4d488b7a2a66f2a5d09c0d4e43c/connwrap.go#L259-L265

And receiving side doesn't expect channel to be closed: https://github.com/networkservicemesh/sdk/blob/0d2dc281b97c05ed82472a89ceae15cf5122c9e8/pkg/networkservice/common/mechanisms/recvfd/common.go#L60-L72

  1. Probably it would be better to send some Event: { File: *os.File, Error: error } instead of simply sending *os.File and closing channel on error?
  2. I am not quite sure what is the initial error, so we need to further investigate it.
denis-tingaikin commented 3 years ago

@edwarnicke cc

edwarnicke commented 3 years ago

@Bolodya1997 I've fix the panic side of the issue here: https://github.com/networkservicemesh/sdk/pull/910

Looking into why it can happen... we basically only have two potential issues that lead to an error:

https://github.com/edwarnicke/grpcfd/blob/10fb469a6976e4d488b7a2a66f2a5d09c0d4e43c/connwrap.go#L230-L235

  1. EBUSY - due to some very esoteric race conditions inside the kernel
  2. EMFILE - due to running out of available file descriptors

Do you still see this error https://github.com/networkservicemesh/sdk/pull/910 applied (admittedly not as a panic, but as an error)? Do you have a notion of why you are seeing it (EBUSY or EMFILE)?

Bolodya1997 commented 3 years ago

Doesn't seem to be reproducible on kernel tests, probably need to additionally test with other mechanisms.

edwarnicke commented 3 years ago

@Bolodya1997 Let me know how that goes :)