Open glazychev-art opened 1 year ago
It looks like grpc.WithBlock()
is anti-pattern - https://pkg.go.dev/google.golang.org/grpc#WithBlock
We need to consider whether we can refuse it. As far as I remember, it is worth paying attention to DialTimeout.
I think we need get rid of using deperecated options
We found that there are only a few cases where NSM doesn't work without grpc.WithBlock()
.
Without grpc.WithBlock()
NSM usually fails on heal tests where we delete remote nsmgr
. In this case the forwarder cannot connect to the NSE because it has IP address of the deleted remote nsmgr
. Usually, grpc
immediatelly returns an error if we try to dial to the server which doesn't listen on a port (even without grpc.WithBlock()
). But in case of the dead remote nsmgr
we don't have this server and IP on the cluster at all and grpc
hangs forever or until timeout set via grpc.WaitForReady()
option.
This article explains this grpc
behavior in detail.
Expected Behavior
All applications use the same grpc options
Current Behavior
Some cmds use a different set of options
Failure Information (for bugs)
Need to figure out how grpc.WaitForReady() works (and possibly delete it), add tracing interceptors. We have to check
networkservice
andregistry
chains.Example of correct option sets. Client: https://github.com/networkservicemesh/cmd-nsmgr/blob/3767a9554e633ace231e600512354740071f5519/internal/manager/manager.go#L134-L146
Server: https://github.com/networkservicemesh/cmd-nsmgr/blob/3767a9554e633ace231e600512354740071f5519/internal/manager/manager.go#L159-L166
Context
Failure Logs