jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.61k stars 2.45k forks source link

Migrate from grpc.Dial to grpc.NewClient #5330

Closed yurishkuro closed 6 months ago

yurishkuro commented 8 months ago

In 1.63.x grpc.Dial is being replaced with grpc.NewClient that does not attempt to make a connection. Need to make code fixes to use the new API.

This was originally breaking dependabot PR #5327, but gRPC later un-deprecated the Dial functions. Still, they will likely be deprecated in the future releases, so we need to migrate.

(Update) The only remaining place is

./cmd/jaeger/internal/integration/span_reader.go:45:    cc, err := grpc.DialContext(ctx, ports.PortToHostPort(port), opts...)

Validation:

danish9039 commented 8 months ago

working on it

tico88612 commented 8 months ago

It looks like this requires replacing grpc.Dial and grpc.DialContext with grpc.NewClient.

There are two problems encountered during the replacement process:

  1. grpc.NewClient will ignore WithTimeout DialOption. (https://pkg.go.dev/google.golang.org/grpc#NewClient) How can I solve the problem about grpc.DialContext (or any possible solutions?) here is example https://github.com/jaegertracing/jaeger/blob/8a4748c8b5a8a4e38e54c4b0c6922ef459e12430/cmd/anonymizer/app/query/query.go#L41-L44

  2. In cmd/agent/app/configmanager/grpc/manager_test.go Line 51 if I replace grpc.Dial with grpc.NewClient, should I change Line 58?

    === RUN   TestSamplingManager_GetSamplingStrategy_error
    manager_test.go:58: 
                Error Trace:    /Users/jerry/working/jaeger/cmd/agent/app/configmanager/grpc/manager_test.go:58
                Error:          "rpc error: code = Unavailable desc = name resolver error: produced zero addresses" does not contain "Error while diali
    ng: dial tcp: address foo: missing port in address"
                Test:           TestSamplingManager_GetSamplingStrategy_error
    --- ❌ FAIL: TestSamplingManager_GetSamplingStrategy_error (0.32s)

    https://github.com/jaegertracing/jaeger/blob/8a4748c8b5a8a4e38e54c4b0c6922ef459e12430/cmd/agent/app/configmanager/grpc/manager_test.go#L50-L59

yurishkuro commented 8 months ago

Can you find docs, tickets, or changelog that explains why they replaced Dial with NewClient and what migration path is suggested?

tico88612 commented 8 months ago

I have searched for the relevant documents as far as I can.

Unfortunately, I didn't find any information about the migration path, and other grpc-go users weren't notified either.

yurishkuro commented 7 months ago

The existing docs do say that WithTimeout should not be used, so it's safe to refactor to use NewClient and drop the timeouts handling during that phase.

To your second question about the error string, my preference for testing errors like that is to wrap them in the business logic, e.g. return fmt.Errorf("our internal message/explanation: %w", err) and then the test should only match on our part of the error string, not on what comes from the library (which could change across versions, as happens this time)

tico88612 commented 7 months ago

The current grpc-go version v1.63.x removes the deprecated tags from Dial and DialContext.

CI has been passed.

But grpc-go will add back the deprecated tag in v1.64, I will continue to study how to remove Dial and DialContext to NewClient and fix the related test cases.

yurishkuro commented 7 months ago

May be worth splitting your pr into pieces that are known to work ( verify via go test on individual packages)

tico88612 commented 7 months ago

I can change the parts that don't affect the test first, and I'll mark TODO for the others that don't.

yurishkuro commented 6 months ago

Last one #5443