networkservicemesh / sdk

Apache License 2.0
33 stars 36 forks source link

Get rid of deprecated grpc options #1455

Open glazychev-art opened 1 year ago

glazychev-art commented 1 year ago

Expected Behavior

All applications use the same grpc options

Current Behavior

Some cmds use a different set of options

Failure Information (for bugs)

Need to figure out how grpc.WaitForReady() works (and possibly delete it), add tracing interceptors. We have to check networkservice and registry chains.

Example of correct option sets. Client: https://github.com/networkservicemesh/cmd-nsmgr/blob/3767a9554e633ace231e600512354740071f5519/internal/manager/manager.go#L134-L146

Server: https://github.com/networkservicemesh/cmd-nsmgr/blob/3767a9554e633ace231e600512354740071f5519/internal/manager/manager.go#L159-L166

Context

Failure Logs

glazychev-art commented 1 year ago

It looks like grpc.WithBlock() is anti-pattern - https://pkg.go.dev/google.golang.org/grpc#WithBlock We need to consider whether we can refuse it. As far as I remember, it is worth paying attention to DialTimeout.

denis-tingaikin commented 1 year ago

I think we need get rid of using deperecated options

NikitaSkrynnik commented 9 months ago

Current State

We found that there are only a few cases where NSM doesn't work without grpc.WithBlock().

Without grpc.WithBlock() NSM usually fails on heal tests where we delete remote nsmgr. In this case the forwarder cannot connect to the NSE because it has IP address of the deleted remote nsmgr. Usually, grpc immediatelly returns an error if we try to dial to the server which doesn't listen on a port (even without grpc.WithBlock()). But in case of the dead remote nsmgr we don't have this server and IP on the cluster at all and grpc hangs forever or until timeout set via grpc.WaitForReady() option.

This article explains this grpc behavior in detail.