grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
20.94k stars 4.34k forks source link

credentials/google/google not working with oauth2 and ADC #6285

Closed costinm closed 1 year ago

costinm commented 1 year ago

I was testing some code with XDS and Traffic Director - and getting 'context canceled'. I traced a bit the code, and the problem seems to be:

func NewDefaultCredentialsWithOptions(opts DefaultCredentialsOptions) credentials.Bundle {
    if opts.PerRPCCreds == nil {
        ctx, cancel := context.WithTimeout(context.Background(), tokenRequestTimeout)
        defer cancel()
        var err error
        opts.PerRPCCreds, err = newADC(ctx)
        if err != nil {
            logger.Warningf("NewDefaultCredentialsWithOptions: failed to create application oauth: %v", err)
        }
    }

Looking into the newADC and bellow - the ctx is saved in the golang.org/x/oauth2/oauth2.go TokenSource. Since the grpc method cancels the context - no token can be retrieved.

I replaced the code with just 'context.Background' - and now it works.

Not sure how it worked in the past or if something changed in the deps - for normal gRPC it may not be a problem since other methods may be used to customize auth, but with XDS I don't see any other way.

What version of gRPC are you using?

master (git version 2.40.1.606.ga4b1b128d6-goog)

What version of Go are you using (go version)?

1.20

What operating system (Linux, Windows, …) and version?

What did you do?

If possible, provide a recipe for reproducing the error.

What did you expect to see?

What did you see instead?

dfawley commented 1 year ago

Interesting. This behavior of storing the context away for future use by the token source is definitely not clear from the docstring here: https://pkg.go.dev/golang.org/x/oauth2/google#FindDefaultCredentialsWithParams

Not sure how it worked in the past

Is it possible this works if it's able to retrieve the credentials before this function exits and cancels the context? Does putting a few seconds of time.Sleep before returning from this function make things work for you, e.g.? (I am not suggesting doing this in the production code -- just as an experiment.) Or maybe it depends on the location it finds the credentials?

We might need an API change or new method to allow the user to pass their own context, and add a comment on our code that indicates it should not have a timeout. Otherwise resources might not be getting cleaned up correctly when the credentials are done being used. (In reality that would typically be the end of the binary, but it would be better to do it the right way, or at least provide some way to do it that is correct.

easwars commented 1 year ago

@costinm : Is this still an issue for you? We haven't heard any other reports of this not working and all of our interop tests use ADC with xDS and they seem to be running fine.

costinm commented 1 year ago

I hit it while doing some tests - confirmed with the code and docs. It is not something that would happen routinely or be very noticeable, in particular in an interop test ( if the test runs fast - the context won't be canceled ).

Do you have a lot of users (or tests) for proxyless gRPC using ADC ( instead of MDS as is typical on GKE ) ? That means proxyles gRPC on VMs or off-GCP.

On Tue, Jul 11, 2023 at 10:44 AM Easwar Swaminathan < @.***> wrote:

@costinm https://github.com/costinm : Is this still an issue for you? We haven't heard any other reports of this not working and all of our interop tests use ADC with xDS and they seem to be running fine.

— Reply to this email directly, view it on GitHub https://github.com/grpc/grpc-go/issues/6285#issuecomment-1631237786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2VQGPRR7DDQTBIXGRDXPWGIFANCNFSM6AAAAAAYCZXDUQ . You are receiving this because you were mentioned.Message ID: @.***>

easwars commented 1 year ago

@costinm : Do you happen to have a test that we can use, that can repro this issue? Thanks.

costinm commented 1 year ago

Sorry, long gone. If I remember correctly, the trick is to make sure you use ADC ( not MDS ), and have the test and XDS connection run longer than the context expiration for the token get method.

The code - and docs - seem pretty clear on what would happen, the context will expire. It is possible I don't understand all the code - but I'm pretty sure the XDS connection failed and didn't seem to recover.

On Wed, Jul 12, 2023 at 9:57 AM Easwar Swaminathan @.***> wrote:

@costinm https://github.com/costinm : Do you happen to have a test that we can use, that can repro this issue? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/grpc/grpc-go/issues/6285#issuecomment-1632891123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2WSMGNVGM6Y6GHSBBDXP3JPJANCNFSM6AAAAAAYCZXDUQ . You are receiving this because you were mentioned.Message ID: @.***>

dfawley commented 1 year ago

I've dug through the oauth2 code for some time and I'm still unsure of how the context is being used. At this point, though, I'm confident that if there is a bug, it's the oauth2 library's responsibility to use the context parameter in accordance with standard Go expectations, e.g.:

Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:

But given that I'm not sure enough about how their code works, and that no other users have reported this problem, I'm going to close this for now. Maybe the errors you were seeing were different context cancellations, and the change to use Background() fixed it just because of a coincidence? We might not have many users of xDS+ADC, but surely we have a lot of users using ADC without xDS...?

qfel commented 1 year ago

I have ran into it a number of times and I'm quite surprised there's so little demand to fix it.. But yes, it seems to be the oauth library that has broken ctx handling.

The (unexpectedly stored) ctx eventually makes it to https://github.com/golang/oauth2/blob/2d9e4a2adf33fc3ce68d77995fadda7234520e5c/internal/token.go#L256, which uses it for HTTP request.

costinm commented 1 year ago

I would guess there is little usage of proxyless outside of GCP ( in GCP the MDS is used, so no problem ) ?

Thanks for confirming I was not dreaming...

On Thu, Sep 14, 2023 at 4:27 PM Piotr Kufel @.***> wrote:

I have ran into it a number of times and I'm quite surprised there's so little demand to fix it.. But yes, it seems to be the oauth library that has broken ctx handling.

The (unexpectedly stored) ctx eventually makes it to https://github.com/golang/oauth2/blob/2d9e4a2adf33fc3ce68d77995fadda7234520e5c/internal/token.go#L256, which uses it for HTTP request.

— Reply to this email directly, view it on GitHub https://github.com/grpc/grpc-go/issues/6285#issuecomment-1720280731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2Q5VZP25CEQOK4ET2TX2OHGPANCNFSM6AAAAAAYCZXDUQ . You are receiving this because you were mentioned.Message ID: @.***>