peburrows / goth

Elixir package for Oauth authentication via Google Cloud APIs
http://hexdocs.pm/goth
MIT License
289 stars 111 forks source link

"Too many failed attempts" after dev computer wakes from long sleep #113

Closed petelacey closed 2 years ago

petelacey commented 2 years ago

This is most likely an issue on my end, but the solution is not obvious to me.

I previously used Goth 1.2 without any issues. After upgrading to 1.3.0-rc.3 and configuring it exactly as the readme says, I have the following problem.

When developing, if my laptop suspends for longer than the token expiration window, when resuming, Goth will attempt to get a token from Google using the already expired cached refresh token. Google replies with a 400, which causes Goth to raise and fail. However, when the supervisor restarts it, Goth does not ask for a new token using the service account credentials, but instead uses the stale refresh token again. This repeats every second until I restart the app.

At least that's how it appears to me.

Here's what I see in my logs. Formatted for legibility:

[error] GenServer {Goth.Registry, Readyroom.Goth} terminating
** (RuntimeError) too many failed attempts to refresh, last error:
  %RuntimeError{
    message: "unexpected status 400 from Google\n\n{
        \"error\":\"invalid_grant\",
        \"error_description\":\"Invalid JWT: Token must be a short-lived token (60 minutes) and in a reasonable timeframe. Check your iat and exp values in the JWT claim.\"}\n"
    }
    (goth 1.3.0-rc.3) lib/goth/server.ex:80: Goth.Server.handle_info/2
    (stdlib 3.16.1) gen_server.erl:695: :gen_server.try_dispatch/4
    (stdlib 3.16.1) gen_server.erl:771: :gen_server.handle_msg/6
    (stdlib 3.16.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

Last message: :refresh

State: %Goth.Server{
  http_client: {Goth.HTTPClient.Hackney, %Goth.HTTPClient.Hackney{default_opts: []}},
  name: Readyroom.Goth,
  refresh_before: 300,
  retries: 1,
  retry_after: 1000,
  source: {
    :service_account, %{
      "auth_provider_x509_cert_url" => "https://www.googleapis.com/oauth2/v1/certs",
      "auth_uri" => "https://accounts.google.com/o/oauth2/auth",
      "client_email" => "<elided>.iam.gserviceaccount.com",
      "client_id" => "<elided>",
      "client_x509_cert_url" => "https://www.googleapis.com/robot/v1/metadata/x509/<elided>.iam.gserviceaccount.com",
      "private_key" => "-----BEGIN PRIVATE KEY-----\n<elided>\n-----END PRIVATE KEY-----\n",
      "private_key_id" => "<elided>",
      "project_id" => "<elided>",
      "token_uri" => "https://oauth2.googleapis.com/token",
      "type" => "service_account"
    },
    [scopes: ["https://www.googleapis.com/auth/cloud-platform"]]
  }
}

Any assistance you can give would be appreciated.

ckochx commented 2 years ago

I am receiving a similar message. On a Raspberry Pi Zero running nerves (1.7.13) Elixir 1.13.2 Goth 1.3.0-rc.3

As soon as the device boots (after a fresh firmware update) I see this message repeated for some period of time (perhaps 1-2 minutes) then it appears to resolve and cease.

wojtekmach commented 2 years ago

I was able to reproduce the exact same error by turning my clock backwards for couple hours and starting Goth. It makes sense, the token would be way in the past and so Google rejects it with 400. I believe something like this might happen when a computer wakes up from the sleep, it'd use the "old" clock for a while before it catches up with NTP and such.

There's nothing we can do about this particular error per se. What we can do is improve our retry logic to be more robust and give the system more time to self heal.

Closing in favour of #123.