Closed toga4 closed 1 month ago
cc: @quartzmo
Upon further investigation, we found that the latency is due to differences in how the grpc.ClientConn is created between the old and new authentication libraries.
In the previous authentication library, grpc.DialContext
is used to create the connection:
https://github.com/googleapis/google-api-go-client/blob/1cd11321a77d6d42ce745edc35c1b9b81a28381f/transport/grpc/dial.go#L354
On the other hand, in the new authentication library, grpc.NewClient
is used:
https://github.com/googleapis/google-cloud-go/blob/7f61cd579f7e4ed4f1ac161f2c2a28e931406f16/auth/grpctransport/grpctransport.go#L300
The key difference between grpc.DialContext
and grpc.NewClient
is default schemes of resolver.
The grpc.DialContext
sets the default scheme to passthrough, the grpc.NewClient
uses dns as the default.
By analyzing the runtime/trace tool, we discovered that in the new authentication library, takes longer time for resolving DNS. When we tried setting the default scheme to passthrough before running the benchmarks using grpc/resolver.SetDefaultScheme("passthrough")
, the latency improved, even with the new authentication library.
I hope this additional investigation is helpful for addressing the issue.
@toga4 Thank you for reporting this issue, and for your thorough and detailed analysis. We will prioritize a fix.
@toga4 After investigating this issue a bit, I am not certain that the current implementation default (using the dns
scheme instead of passthrough
) is detrimental to all users in all environments. As I understand it, the passthrough scheme, being quite literal, prevents users from benefiting from client-side load balancing across all DNS results for an address. The dns scheme, despite the initial slowdown observed, supports such load balancing.
Will it work for you to use grpc/resolver.SetDefaultScheme("passthrough")
to improve latency in your test (emulator) environment, and take a wait-and-see approach to evaluating performance in production environments such as GKE?
@quartzmo Thank you for considering about this issue.
You’re absolutely right that some environments could benefit from the client-side load balancing benefits using the dns scheme.
We can take the approach you suggested. Additionaly, I’ve confirmed that using passthrough:///localhost:9010
instead of localhost:9010
in the SPANNER_EMULATOR_HOST
environment variable ensures that the passthrough scheme is applied. We’ll proceed with this solution for now.
When using the emulator, I think the benefits of client-side load balancing are limited in most cases. It might be worth considering a change where the passthrough scheme is used by default when the emulator is detected and no scheme is specified in the endpoint configuration.
That said, I understand that there are several places in the SDK source code where the emulator is detected, so implementing this change might require updates across multiple locations...
It might be worth considering a change where the passthrough scheme is used by default when the emulator is detected and no scheme is specified in the endpoint configuration.
Thanks for this suggestion! We will investigate it and leave this issue open to track it.
@harshachinta Per the decision above to address this issue by changing the default scheme of the host to passthrough
ONLY for the Spanner library emulator support, I believe this work will be best handled by your team. I will communicate the suggestion to maintainers of the other vertical client libraries that also support emulators with gRPC.
Reassigning to @rahul2393 since @harshachinta hasn't responded.
Client
Spanner. probably all clients which uses new auth lib.
Environment
Apple M3
Code and Dependencies
go.mod
```text module m go 1.23.1 require cloud.google.com/go/spanner v1.68.0 require ( cel.dev/expr v0.16.0 // indirect cloud.google.com/go v0.115.1 // indirect cloud.google.com/go/auth v0.9.3 // indirect cloud.google.com/go/auth/oauth2adapt v0.2.4 // indirect cloud.google.com/go/compute/metadata v0.5.0 // indirect cloud.google.com/go/monitoring v1.21.0 // indirect github.com/GoogleCloudPlatform/grpc-gcp-go/grpcgcp v1.5.0 // indirect github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.24.1 // indirect github.com/census-instrumentation/opencensus-proto v0.4.1 // indirect github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/cncf/xds/go v0.0.0-20240822171458-6449f94b4d59 // indirect github.com/envoyproxy/go-control-plane v0.13.0 // indirect github.com/envoyproxy/protoc-gen-validate v1.1.0 // indirect github.com/felixge/httpsnoop v1.0.4 // indirect github.com/go-logr/logr v1.4.2 // indirect github.com/go-logr/stdr v1.2.2 // indirect github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect github.com/google/s2a-go v0.1.8 // indirect github.com/google/uuid v1.6.0 // indirect github.com/googleapis/enterprise-certificate-proxy v0.3.4 // indirect github.com/googleapis/gax-go/v2 v2.13.0 // indirect github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect go.opencensus.io v0.24.0 // indirect go.opentelemetry.io/contrib/detectors/gcp v1.29.0 // indirect go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.54.0 // indirect go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.54.0 // indirect go.opentelemetry.io/otel v1.29.0 // indirect go.opentelemetry.io/otel/metric v1.29.0 // indirect go.opentelemetry.io/otel/sdk v1.29.0 // indirect go.opentelemetry.io/otel/sdk/metric v1.29.0 // indirect go.opentelemetry.io/otel/trace v1.29.0 // indirect golang.org/x/crypto v0.27.0 // indirect golang.org/x/net v0.29.0 // indirect golang.org/x/oauth2 v0.23.0 // indirect golang.org/x/sync v0.8.0 // indirect golang.org/x/sys v0.25.0 // indirect golang.org/x/text v0.18.0 // indirect golang.org/x/time v0.6.0 // indirect google.golang.org/api v0.197.0 // indirect google.golang.org/genproto v0.0.0-20240903143218-8af14fe29dc1 // indirect google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1 // indirect google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 // indirect google.golang.org/grpc v1.66.2 // indirect google.golang.org/protobuf v1.34.2 // indirect ) ```Expected behavior
equivalent performance.
Actual behavior
Additional context
After upgrading to a recent version of the Google Cloud Go SDK, we observed a significant slowdown in the execution time of our end-to-end tests using Cloud Spanner Emulator. The slowdown occurs at the point of the first communication after creating the Client, which led us to suspect that the issue might be related to authentication.
Upon investigating the differences between versions, we discovered that a new authentication library is being used in the updated SDK. The same issue occurs with the DatabaseAdminClient and InstanceAdminClient. We think that the problem likely affects all clients that rely on the new authentication library.
By switching back to the previous authentication library via environment variables
GOOGLE_API_GO_EXPERIMENTAL_DISABLE_NEW_AUTH_LIB
, we were able to restore the previous execution times.While we understand that in actual Google Cloud environments, factors like network latency and the frequency of authentication may make this performance difference less impactful, the issue becomes significant in cases where multiple Client instances are generated, such as in test code using the Emulator. In such scenarios, the slowdown can have a substantial impact, and we believe addressing this would be a meaningful improvement.