i8beef / HomeAutio.Mqtt.GoogleHome

MIT License
215 stars 29 forks source link

Refresh token validation failed. aborting #69

Closed GAZ082 closed 4 years ago

GAZ082 commented 4 years ago
020-03-16 20:06:43.386 -03:00 [Information] Request starting HTTP/1.1 POST http://192.168.1.2:5000/googlehome/connect/token application/x-www-form-urlencoded 224
2020-03-16 20:06:44.309 -03:00 [Information] Invoking IdentityServer endpoint: "IdentityServer4.Endpoints.TokenEndpoint" for "/connect/token"
2020-03-16 20:06:44.497 -03:00 [Warning] Failed to find token with key "xxxxxxxxxxxxxxxx"
2020-03-16 20:06:44.520 -03:00 [Warning] Invalid refresh token
2020-03-16 20:06:44.571 -03:00 [Warning] Refresh token validation failed. aborting., request details: []
2020-03-16 20:06:45.014 -03:00 [Information] Request finished in 1553.7961ms 400 application/json; charset=UTF-8

I did a gactions test as a few months ago and works for 12 hours or so, but then this happens every time. :cry:

i8beef commented 4 years ago

Are you sure your tokenStoreFile is set correctly, and is being written to (i.e., permissions are correct)? Sounds like the token isn't being persisted after creation, so when it goes to fetch it to get the refresh token, it can't find it.

GAZ082 commented 4 years ago

The tokens.json file is update if that's what you are asking and it's rw by the the same user that runs the docker instance.

-rw-r--r-- 1 gabriel gabriel  1602 Dec 24 15:15 appsettings.Production.json
-rw-r--r-- 1 gabriel gabriel  3008 Feb 16 15:51 googleDevices.json
-rw------- 1 gabriel gabriel  5373 Aug 16  2018 signingKey.pfx
-rw-r--r-- 1 gabriel gabriel     0 Aug 16  2018 tempkey.rsa
-rw-r--r-- 1 gabriel root    75338 Mar 17 14:55 tokens.json
i8beef commented 4 years ago

Did you relink the app in Google Home? Either Google is holding onto an old token that is expired (which relinking should fix) or your app isn't holding the token past initial issuance (token.json empty or just not being written to, etc.).

GAZ082 commented 4 years ago

Of course relinked every time. What I did not is purged tokens.json and it regenerated with the latest re-sync. Let's see how it goes.

GAZ082 commented 4 years ago

I'm officialy locked down so i'll have time to test it while doing home office.

i8beef commented 4 years ago

Well if it regenerated the file then its writing to it. Im not sure if the IdentityServer will write out that token ID if you set "logPII": true in the config file, but you could try it the next time this happens to see if you can see the refresh token, and then compare to the token.json file to see if it contains that refresh token value.

Also, make sure your client configuration has a "refreshTokenLifetime" set in days (defaults to 30 days if not set).

CoderFX commented 4 years ago

Same has been happening to me for the last few weeks - I have to re-link device on google home app every day. Current settings: "refreshTokenLifetime": 365 Is there any way to keep devices alive for longer time?

i8beef commented 4 years ago

These are the settings I use, and they work just fine. Also make sure you properly followed step 5 in the account linking instructions and added the right oauth scopes in Google Actions Console (https://github.com/i8beef/HomeAutio.Mqtt.GoogleHome/wiki/Setup:-Google-Actions-Console#account-linking)

  "oauth": {
    "tokenStoreFile": "config/tokens.json",
    "authority": "https://mypublicurl.com/google/home",
    "publicOrigin": "https://mypublicurl.com",
    "requireSSL": false,
    "signingCerts": [
      {
        "file": "config/signingKey.pfx",
        "passPhrase": "MySigningCertKey"
      }
    ],
    "clients": [
      {
        "clientId": "0902838f-5a3a-4fc6-946a-692f22f0aacc",
        "clientSecret": "c6ab7515-f575-41b6-adc4-c1d9719e99b7",
        "clientName": "Google Actions Client",
        "allowedRedirectUris": [ "https://oauth-redirect.googleusercontent.com/r/myAppId" ],
        "refreshTokenLifetime": 365
      }
    ],
    "resources": [
      {
        "resourceName": "HomeAutio.Mqtt.GoogleHome"
      }
    ],
    "users": [
      {
        "subjectId": "00000000-0000-0000-0000-000000000000",
        "username": "googleActions",
        "password": "SomePass"
      }
    ]
  },

Note of course all GUIDs and other sensitive information has been changed obviously.

i8beef commented 4 years ago

Closing for lack of activity, reopen if you're still having issues.

clmcavaney commented 4 years ago

G'day,

I had a similar issue and comparing my appsettings with yours above, the only difference was "requireSSL": false, I had that as true. I have changed it and will see if that changes anything.

Regards, Christopher

i8beef commented 4 years ago

I'm pretty sure requireSSL should always be false since this app is meant to be proxied at all times, and that proxy should be SSL secured. I.e., that really just turns off a security check of "don't do OAuth over non-SSL connection" that could otherwise cause trouble in a proxied setup where the proxy is fronting the SSL connection, so you are correctly protected, but the app can't TELL that.

If you have that ON and are proxying correctly, I would think you'd get errors in the log about not running in an SSL context. Edit: actually the error might be reflected to GOOGLE not your log, Im unsure what IdentityServer logs in this case. Regardless, just make sure its false.

This thread has a number of things to check, like the account linking scopes in Actions for Google Console, write permissions, and refreshLifetime setting.

clmcavaney commented 4 years ago

Thanks Michael. Whilst I thought that a simple change from true to false of requireSSL fixed the issue. This morning it wasn't responding and the similar error as first listed above occured:

[23:02:43 INF] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-urlencoded 180
[23:02:43 INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token
[23:02:43 WRN] Failed to find token with key XXXX/XXXX=
[23:02:43 WRN] Invalid refresh token
[23:02:43 WRN] Refresh token validation failed. aborting, {"ClientId": "845e66ff-6b7c-4bc3-a36d-6d2bd27b0788", "ClientName": "Google Actions Client", "GrantType": "refresh_token", "Scopes": null, "AuthorizationCode": null, "RefreshToken": null, "UserName": null, "AuthenticationContextReferenceClasses": null, "Tenant": null, "IdP": null, "Raw": {"grant_type": "refresh_token", "refresh_token": "***REDACTED***", "client_id": "845e66ff-6b7c-4bc3-a36d-6d2bd27b0788", "client_secret": "***REDACTED***"}, "$type": "TokenRequestValidationLog"}
[23:02:44 INF] Request finished in 1524.93ms 400 application/json; charset=UTF-8

I might plat around with the refreshLifetime setting next.

i8beef commented 4 years ago

The default refresh life time is 30 days. If you're seeing it reset after only a day or so, I have to think you have an issue EITHER in the expiration, or in the storage of said tokens. You can open the token.json file and read the expiration of some of them and make sure they look like they are getting set with a long enough expiration.

clmcavaney commented 4 years ago

Here is a selection from my token.json file, I had 365 days, then changed to 30 days today (24th)

$ jq . tokens.json | grep -B 1 Expiration
    "CreationTime": "2020-08-22T12:54:23Z",
    "Expiration": "2021-08-22T14:23:26Z",
--
    "CreationTime": "2020-08-22T12:41:20Z",
    "Expiration": "2021-08-22T12:41:20Z",
--
    "CreationTime": "2020-08-22T12:54:23Z",
    "Expiration": "2021-08-22T14:23:26Z",
--
    "CreationTime": "2020-08-22T12:42:51Z",
    "Expiration": "2021-08-22T12:42:51Z",
--
    "CreationTime": "2020-08-24T02:23:49Z",
    "Expiration": null,
--
    "CreationTime": "2020-08-24T01:49:34Z",
    "Expiration": "2021-08-24T01:49:34Z",
--
    "CreationTime": "2020-08-22T16:07:27Z",
    "Expiration": "2021-08-23T23:01:44Z",
--
    "CreationTime": "2020-08-22T12:41:17Z",
    "Expiration": null,
--
    "CreationTime": "2020-08-24T02:23:52Z",
    "Expiration": "2020-09-07T07:29:59Z",
--
    "CreationTime": "2020-08-22T14:26:00Z",
    "Expiration": "2021-08-22T15:53:28Z",
--
    "CreationTime": "2020-08-22T12:33:02Z",
    "Expiration": "2021-08-22T12:33:02Z",
--
    "CreationTime": "2020-08-22T12:32:59Z",
    "Expiration": null,

So some have 365 days (1yr) others 30 days, then some others null - don't know why that is the case. Those are the user_consent token types.

I presume it is safe to remove this file, unlink and relink to do a clean "test"?

i8beef commented 4 years ago

I wouldn't change to a 30 day lifetime if I were you. Don't use your Home for 30 days for any commands and you'll expire and have to relink.

Those tokens have types on them. The "rehresh_token" types will carry those expiration dates. The "user_consent" types will be null for expiration. Make sure that the refresh token Data field contains lines like this:

{"Type":"scope","Value":"api","ValueType":"http://www.w3.org/2001/XMLSchema#string"},{"Type":"scope","Value":"offline_access","ValueType":"http://www.w3.org/2001/XMLSchema#string"},

And that the user_consent ones contain something like

"Scopes":["api","offline_access"]

That offline_access scope is what makes the entire refresh token pipeline work.

Yes, you can stop it, empty the file and restart + relink. The cleanup job should cleanout refresh tokens that expire automatically eventually though.

clmcavaney commented 4 years ago

I revered the refresh back to 365 and things appear to be working now.

Thanks for this fantastic library. It is great.

clmcavaney commented 4 years ago

An update. I created a new tokens.json file and relinked. This worked for a day or so, and then this morning it can't find the token for the key again. With the new tokens.json file, there is a user_consent and refresh_token entries. The logs from the docker container suggest that it can't find a refresh_token: Failed to find token with key DDnEOkvSsp7oPmubXXXXXXXXXXXXXXXXXX

The ClientId of the request from the Docker logs matches the token file, but the key doesn't. Very strange.

There is one warning that I am curious on, when the container is first loaded:

[23:48:27 WRN] Storing keys in a directory '/root/.aspnet/DataProtection-Keys' that may not be persisted outside of the container. Protected data will be unavailable when container is destroyed.

Any ideas why this would be happening?

i8beef commented 4 years ago

I wouldn't be concerned about that, the only thing I think that might hurt is the admin interface logins surviving container rebuilds. I don't get that error in mine though, I get three other warnings that essentially mean the same thing, so thats a little weird.

The original error comes right out of the PersistantGrantStore. It's literally just looking for the token by key in an internal dictionary that it reads in on startup from the token.json file. There's not a lot that can fail there if you are in fact persisting that and mounting it correctly. Do you have a line like this in the log after startup indicating that it restored the tokens correctly? And what does your docker command look like?

2020-08-28 23:58:02.297 +00:00 [Information] Restored tokens from "config/tokens.json"
clmcavaney commented 4 years ago

This is from a docker restart I just did:

2020-08-29 09:17:52.284 +00:00 [Information] Service stop initiated
2020-08-29 09:17:52.286 +00:00 [Information] MQTT unsubscribing to the following topics: google/home/#, homie/wemos-d1-mini-02/light/on
2020-08-29 09:17:52.470 +00:00 [Information] Stopped
2020-08-29 09:17:52.473 +00:00 [Information] Service stopped successfully
2020-08-29 09:17:52.524 +00:00 [Information] Disconnected.
2020-08-29 09:17:52.525 +00:00 [Information] MQTT Connection closed
2020-08-29 09:18:02.779 +00:00 [Information] Loaded with configuration from: appsettings.json, /app/config/appsettings.Production.json
2020-08-29 09:18:05.745 +00:00 [Information] Starting IdentityServer4 version "3.1.0.0"
2020-08-29 09:18:06.536 +00:00 [Information] Restored tokens from "config/tokens.json"
2020-08-29 09:18:06.675 +00:00 [Information] Using the default authentication scheme "Cookies" for IdentityServer
2020-08-29 09:18:07.038 +00:00 [Information] Loaded device configuration from config/googleDevices.json
2020-08-29 09:18:07.094 +00:00 [Warning] Storing keys in a directory '"/root/.aspnet/DataProtection-Keys"' that may not be persisted outside of the container. Protected data will be unavailable when container is destroyed.
2020-08-29 09:18:07.094 +00:00 [Information] User profile is available. Using '"/root/.aspnet/DataProtection-Keys"' as key repository; keys will not be encrypted at rest.
2020-08-29 09:18:07.100 +00:00 [Information] Service start initiated
2020-08-29 09:18:07.106 +00:00 [Information] MQTT subscribing to the following topics: google/home/#, homie/wemos-d1-mini-02/light/on
2020-08-29 09:18:07.118 +00:00 [Information] Started
2020-08-29 09:18:07.119 +00:00 [Information] Service started successfully
2020-08-29 09:18:07.468 +00:00 [Information] Connected.
2020-08-29 09:18:07.475 +00:00 [Information] MQTT Connection established
2020-08-29 09:18:07.478 +00:00 [Information] Publishing subscriptions at reconnect

So, it does have the particular line. 2020-08-29 09:18:06.536 +00:00 [Information] Restored tokens from "config/tokens.json"

Then I checked on Google Home Assistant and everything worked fine:

2020-08-29 09:19:18.146 +00:00 [Information] Request starting HTTP/1.1 POST http://localhost:5000/smarthome application/json;charset=UTF-8 142
2020-08-29 09:19:19.142 +00:00 [Information] Request starting HTTP/1.1 GET http://localhost:5000/.well-known/openid-configuration
2020-08-29 09:19:19.172 +00:00 [Information] Invoking IdentityServer endpoint: "IdentityServer4.Endpoints.DiscoveryEndpoint" for "/.well-known/openid-configuration"
2020-08-29 09:19:19.256 +00:00 [Information] Request finished in 114.2868ms 200 application/json; charset=UTF-8
2020-08-29 09:19:19.371 +00:00 [Information] Request starting HTTP/1.1 GET http://localhost:5000/.well-known/openid-configuration/jwks
2020-08-29 09:19:19.373 +00:00 [Information] Invoking IdentityServer endpoint: "IdentityServer4.Endpoints.DiscoveryKeyEndpoint" for "/.well-known/openid-configuration/jwks"
2020-08-29 09:19:19.402 +00:00 [Information] Request finished in 30.5182ms 200 application/jwk-set+json; charset=UTF-8
2020-08-29 09:19:19.510 +00:00 [Information] Successfully validated the token.
2020-08-29 09:19:19.519 +00:00 [Information] Authorization was successful.
2020-08-29 09:19:19.520 +00:00 [Information] Executing endpoint '"HomeAutio.Mqtt.GoogleHome.Controllers.GoogleHomeController.Post (HomeAutio.Mqtt.GoogleHome)"'
2020-08-29 09:19:19.559 +00:00 [Information] Route matched with "{action = \"Post\", controller = \"GoogleHome\"}". Executing controller action with signature "Microsoft.AspNetCore.Mvc.IActionResult Post(HomeAutio.Mqtt.GoogleHome.Models.Request.Request)" on controller "HomeAutio.Mqtt.GoogleHome.Controllers.GoogleHomeController" ("HomeAutio.Mqtt.GoogleHome").
2020-08-29 09:19:19.599 +00:00 [Information] Received QUERY intent for devices: homie/wemos-d1-mini-02/light
...

Specifically 2020-08-29 09:19:19.510 +00:00 [Information] Successfully validated the token.

I will check again tomorrow morning. It seems to be after an overnight period of time when it fails.

Thanks for all your assistance on this - so helpfil.

Christopher

i8beef commented 4 years ago

I mean that's what it should look like yes. It's interesting that the .NET runtime is deciding you have a "profile" to store those keys while mine doesn't, but I think that's because it looks like you're running the container as root, while I am not (e.g., your container has a /root home for the user, where mine DOESN'T have a /home/user home dir for my user, so the .NET runtime gives me a different message about NOT having a profile to save those to, and its gonna use in memory storage). Long story short: 99.9% sure that ain't your problem.

Even without setting a refresh, it should default to 30 days, not 1 day, and given that the tokens you were able to pull had long expirations on them, I don't think its related to that setting.

I'm at a loss here, so lets try a few other questions:

  1. What host OS are you running?
  2. You are sure you are on latest version, right?
  3. What is your exact docker command?
  4. How are you proxying the app? If it's nginx, what's your proxy config look like?
  5. Are you restarting this container at all between working and not? System restarts? Container rebuilds?
  6. Are you supplying your own signing certificates?
  7. Can you post your complete app config with sensitive bits (passwords, client ids, etc.) removed?
  8. Are you POSITIVE you set the Google Action console auth scopes as in step 5 here? This is necessary to get Google Home to get refresh tokens.
  9. Do you see any logs with a message like "Wrote tokens to {file}" for your tokens.json file? There should be one after each of the auth handshakes during linking (new tokens added), it'd be interesting to see if you have OTHER ones between working and not, as removing expired grants would ALSO write to that file.
  10. Do you see any logs that look like this: "Token file {file} already exists but is empty" or "Failed to restore token with key {key}"? This could indicate that its starting up at some point and explicitly thinks the file is empty or failed to restore a key from it.
clmcavaney commented 4 years ago
  1. Ubuntu 18.04

  2. I presume you mean HomeAutio.Mqtt.GoogleHome - the "docker run" pulled down the version on the Docker Hub at the time. So, the image that is running was created: "Created": "2020-07-26T18:53:15.868029912Z" I presume that is v1.2.0.127 of HomeAutio.Mqtt.GoogleHome Docker version is: Version: 19.03.6

  3. From the initial run:

    docker run --name homeautio.mqtt.googlehome -dit -p 5000:5000 -v /home/chrismc/var/lib/google-home/config:/app/config -v /home/chrismc/var/log/google-home:/app/logs -e ASPNETCORE_PATHBASE="/google/home" -e TZ=Australia/Melbourne i8beef/homeautio.mqtt.googlehome:latest
  4. Apache

        ProxyPass / http://localhost:5000/
        ProxyPassReverse / http://localhost:5000/
        RemoteIPHeader X-Forwarded-For
        RemoteIPTrustedProxy 127.0.0.1

    and also the TLS config so that TLS/SSL is terminated at Apache

  5. Not usually, maybe there had been a system reboot on one of the earlier times. But on the last time, there was no system restart or Docker restart. I haven't rebuilt the container at all yet. To get it working again, I have been unlinking and relinking in the Google Home Assistant app on my phone. But only when it "isn't able to connect" (i.e. "Refresh token validation failed" times).

  6. I am using letsencrypt certificates for the Apache proxy configuration. For the signingCerts (appsettings.json file) yes, created with this command:

    openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout signingKey.key -out signingKey.crt -subj "/CN=googlehome.***REDACTED***" -days 3650
    openssl pkcs12 -export -out signingKey.pfx -inkey signingKey.key -in signingKey.crt -certfile signingKey.crt
  7. Yep

    {
    "deviceConfigFile": "config/googleDevices.json",
    "logPII": false,
    "mqtt": {
    "brokerIp": "mqtt.l.***REDACTED***",
    "brokerPort": 1883,
    "brokerUsername": "googlehome",
    "brokerPassword": "***REDACTED***",
    "brokerUseTls": false
    },
    "googleHomeGraph": {
    "agentUserId": "***REDACTED***",
    "apiKey": "***REDACTED***",
    "serviceAccountFile": "config/Google Home MQTT integration-743e95410e7d.json"
    },
    "oauth": {
    "tokenStoreFile": "config/tokens.json",
    "authority": "https://googlehome.***REDACTED***/",
    "publicOrigin": "https://googlehome.***REDACTED***",
    "requireSSL": false,
    "signingCerts": [
      {
        "file": "config/signingKey.pfx",
        "passPhrase": "***REDACTED***"
      } 
    ],
    "clients": [
      {
        "clientId": "***REDACTED***",
        "clientSecret": "***REDACTED***",
        "clientName": "Google Actions Client",
        "allowedRedirectUris": [ "https://oauth-redirect.googleusercontent.com/r/home-mqtt-integration" ],
        "refreshTokenLifetime": 365
      }
    ],
    "resources": [
      {
        "resourceName": "HomeAutio.Mqtt.GoogleHome"
      }
    ],
    "users": [
      {
        "subjectId": "1ae37d40-7cd4-46e7-8306-241de5d7ba40",
        "username": "***REDACTED***",
        "password": "***REDACTED***"
      }
    ]
    },
    "Serilog": {
    "Enrich": [ "FromLogContext" ],
    "MinimumLevel": "Information",
    "WriteTo": [
      { "Name": "Console" },
      {
        "Name": "RollingFile",
        "Args": {
          "pathFormat": "logs/HomeAutio.Mqtt.GoogleHome.log",
          "retainedFileCountLimit": 31
        }
      }
    ]
    }
    }
  8. image

I can also see this in the log files: 2020-08-30 07:55:21.840 +00:00 [Information] User consented to scopes: ["api", "offline_access"]

  1. Yes, I see these in the log files: 2020-08-31 04:51:21.802 +00:00 [Information] Wrote tokens to "config/tokens.json"

Things like this:

2020-08-31 04:51:21.716 +00:00 [Information] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-urlencoded 180
2020-08-31 04:51:21.718 +00:00 [Information] Invoking IdentityServer endpoint: "IdentityServer4.Endpoints.TokenEndpoint" for "/connect/token"
2020-08-31 04:51:21.720 +00:00 [Information] Token request validation success, TokenRequestValidationLog { ClientId: "845e66ff-6b7c-4bc3-a36d-6d2bd27b0788", ClientName: "Google Actions Client", GrantType: "refresh_token", Scopes: null, AuthorizationCode: null, RefreshToken: "Jd4qQ8KhZj89a0gItrMPddU8n5yteYKp4VigoUwxa7U", UserName: null, AuthenticationContextReferenceClasses: null, Tenant: null, IdP: null, Raw: [("grant_type": "refresh_token"), ("refresh_token": "***REDACTED***"), ("client_id": "***REDACTED***"), ("client_secret": "***REDACTED***")] }
2020-08-31 04:51:21.802 +00:00 [Information] Wrote tokens to "config/tokens.json"
2020-08-31 04:51:21.815 +00:00 [Information] Wrote tokens to "config/tokens.json"
2020-08-31 04:51:21.830 +00:00 [Information] Request finished in 113.5654ms 200 application/json; charset=UTF-8
2020-08-31 04:51:23.837 +00:00 [Information] Request starting HTTP/1.1 POST http://localhost:5000/smarthome application/json;charset=UTF-8 296
2020-08-31 04:51:23.839 +00:00 [Information] Successfully validated the token.

I will check the log before and after when I next try after an overnight period (which seems to be when it looses connection - i.e. knowledge of a key in the tokens file)

  1. No, neither appear in any of the logs.

Thanks, Christopher

clmcavaney commented 4 years ago

Interesting, today this is what happened in the logs:

[01:00:11 INF] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-urlencoded 180
[01:00:17 INF] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-urlencoded 180
[01:00:19 INF] Removing expired grants
[01:00:19 INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token
[01:00:19 INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token
[01:00:23 INF] Token request validation success, {"ClientId": "***REDACTED***", "ClientName": "Google Actions Client", "GrantType": "refresh_token", "Scopes": null, "AuthorizationCode": null, "RefreshToken": "xRDqeShwiGOVe4dvUUJRnGsL6KaIU7UNbWeRhTRcR-U", "UserName": null, "AuthenticationContextReferenceClasses": null, "Tenant": null, "IdP": null, "Raw": {"grant_type": "refresh_token", "refresh_token": "***REDACTED***", "client_id": "***REDACTED***", "client_secret": "***REDACTED***"}, "$type": "TokenRequestValidationLog"}
[01:00:23 INF] Token request validation success, {"ClientId": "***REDACTED***", "ClientName": "Google Actions Client", "GrantType": "refresh_token", "Scopes": null, "AuthorizationCode": null, "RefreshToken": "xRDqeShwiGOVe4dvUUJRnGsL6KaIU7UNbWeRhTRcR-U", "UserName": null, "AuthenticationContextReferenceClasses": null, "Tenant": null, "IdP": null, "Raw": {"grant_type": "refresh_token", "refresh_token": "***REDACTED***", "client_id": "***REDACTED***", "client_secret": "***REDACTED***"}, "$type": "TokenRequestValidationLog"}
[01:00:25 WRN] Failed to remove token with key twkInXO9793B...***REDACTED***
[01:00:25 INF] Wrote tokens to config/tokens.json
[01:00:26 INF] Wrote tokens to config/tokens.json
[01:00:26 INF] Wrote tokens to config/tokens.json
[01:00:26 INF] Request finished in 16078.8725ms 200 application/json; charset=UTF-8
[01:00:26 INF] Request finished in 9171.0837ms 200 application/json; charset=UTF-8
[01:00:28 INF] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-urlencoded 180
[01:00:28 INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token
[01:00:28 WRN] Failed to find token with key twkInXO9793B***REDACTED***
[01:00:28 WRN] Invalid refresh token
[01:00:28 WRN] Refresh token validation failed. aborting, {"ClientId": "***REDACTED***", "ClientName": "Google Actions Client", "GrantType": "refresh_token", "Scopes": null, "AuthorizationCode": null, "RefreshToken": null, "UserName": null, "AuthenticationContextReferenceClasses": null, "Tenant": null, "IdP": null, "Raw": {"grant_type": "refresh_token", "refresh_token": "***REDACTED***", "client_id": "***REDACTED***", "client_secret": "***REDACTED***"}, "$type": "TokenRequestValidationLog"}
[01:00:29 INF] Request finished in 125.6372ms 400 application/json; charset=UTF-8

"Failed to remove token with" entry before. Also, there are double ups in the logs. Not sure why that is occurring either.

i8beef commented 4 years ago

Those requests shouldn't be taking that long, and that looks like Google is calling you MULTIPLE times with the same refresh token, when as soon as it is used, its invalid...

By chance do you have a routine you were running at this point that attempted to change multiple devices at once here?

Also any chance you are running on a resource constrained machine that could be pegged out at this time (wondering about the 15 second response time here...)?

clmcavaney commented 4 years ago

Good questions - there could be an upstream bottleneck. I will look at shifting the docker container to a cloud location and then have it contact MQTT. See if that improves load/connection times.

i8beef commented 4 years ago

By chance do you have a routine you were running at this point that attempted to change multiple devices at once here?

I'm really interested in this question. There's an open issue on one of the Google sample githubs where people were having issues similar to you, but no one seemed to be able to track it down. Given that I have better logging of request pipeline stuff than them, I can see a bit more of the story of what's happening, but I have no explanation for it.

I see two possibilities

  1. Google does not manage cached auth GLOBALLY but on a per call basis. If you have two commands that go off close enough to each other (like above), they can kill the the active tokens accidentally. I.e., they have a race condition in their token management, but it wouldn't be a big deal until this unlikely scenario happened... unless there's high latency in the response, in which case likelihood of it happening goes up. I seem to remember there was a change a while back where routines used to run step-by-step, but now ran in parallel, which is why I ask about routines here... the staggering of the requests makes me think Google is even aware of this hole in their setup and decided "we'll just stagger them by 5 seconds to give the first request that does the refresh token dance time to complete", but because you are taking longer than 5 seconds to complete, they step on each other.
  2. Google has some retry logic on their side and a short timeout, so the first call goes over 5 seconds, and they say "Lets try it again", which initiates another RefreshToken call, getting us into the same situation of two competing refreshes.

I.e., right now my hunch is on (a) a feature of Google's token management being done poorly and (b) people with high latency connections being adversely effected because of this.

clmcavaney commented 4 years ago

G'day Michael, I think you are onto something here. The "routine" is very simple. It is a light, that has either on/off or a brightness level. So if a brightness level setting (from an off state) is a routine, then that would make sense. This morning, this was the reported latency: 2020-09-02 23:17:17.549 +00:00 [Information] Request finished in 5129.2287ms 200 application/json; charset=UTF-8

(NOTE: The date/time values are UTC for some reason, even though I specified the correct TZ value on starting the container)

2020-09-02 23:17:12.681 +00:00 [Information] Request starting HTTP/1.1 POST http://localhost:5000/connect/token application/x-www-form-ur
lencoded 180
2020-09-02 23:17:14.422 +00:00 [Information] Invoking IdentityServer endpoint: "IdentityServer4.Endpoints.TokenEndpoint" for "/connect/to
ken"
2020-09-02 23:17:16.242 +00:00 [Information] Token request validation success, TokenRequestValidationLog { ClientId: "845e66ff-6b7c-4bc3-
a36d-6d2bd27b0788", ClientName: "Google Actions Client", GrantType: "refresh_token", Scopes: null, AuthorizationCode: null, RefreshToken:
 "JnVOTlP8WjuGM***REDACTED***", UserName: null, AuthenticationContextReferenceClasses: null, Tenant: null, IdP: null, Raw
: [("grant_type": "refresh_token"), ("refresh_token": "***REDACTED***"), ("client_id": "***REDACTED***"), ("client_secret": "***REDACTED***")] }
2020-09-02 23:17:17.282 +00:00 [Information] Wrote tokens to "config/tokens.json"
2020-09-02 23:17:17.383 +00:00 [Information] Wrote tokens to "config/tokens.json"
2020-09-02 23:17:17.549 +00:00 [Information] Request finished in 5129.2287ms 200 application/json; charset=UTF-8

but it did work (from the logs):

2020-09-02 23:17:18.353 +00:00 [Information] Request starting HTTP/1.1 POST http://localhost:5000/smarthome application/json;charset=UTF-
8 143
2020-09-02 23:17:18.751 +00:00 [Information] Successfully validated the token.
2020-09-02 23:17:18.861 +00:00 [Information] Authorization was successful.

The client on my phone initially said can't connect but the logs (above) didn't reflect that. When I closed the app and re-opened it, it was fine. So in that case I didn't need to re-link the device.

The latency's reported are all over the place though:

2020-09-02 23:17:17.549 +00:00 [Information] Request finished in 5129.2287ms 200 application/json; charset=UTF-8
2020-09-02 23:17:19.565 +00:00 [Information] Request finished in 1212.1857ms 200 application/json; charset=utf-8
2020-09-02 23:17:28.266 +00:00 [Information] Request finished in 3.0107ms 200 application/json; charset=utf-8
2020-09-02 23:17:28.299 +00:00 [Information] Request finished in 41.234ms 200 application/json; charset=utf-8
2020-09-02 23:17:31.807 +00:00 [Information] Request finished in 363.2253ms 200 application/json; charset=utf-8
2020-09-02 23:17:32.570 +00:00 [Information] Request finished in 80.8348ms 200 application/json; charset=utf-8
2020-09-02 23:17:36.539 +00:00 [Information] Request finished in 15.4586ms 200 application/json; charset=utf-8
2020-09-02 23:17:41.481 +00:00 [Information] Request finished in 27.0138ms 200 application/json; charset=utf-8

I am curious as to why. The system memory doesn't suggest any paging to disk:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.8G        2.9G        2.1G         42M        2.8G        4.6G
Swap:          8.0G        1.3G        6.7G

Could be some other issue with the hardware I'm running this Linux OS on. Maybe age is causing the slowness? ;)

i8beef commented 4 years ago

Well when I say "routine" I mean specifically a Google Home routine (only available in US still maybe). Those can run multiple commands at once or in rapid succession. I was trying to come up with a theory of why you would see two commands execute almost at the same time, and that seemed like a good candidate.

Those timings don't look bad except for the 5 second one (and maybe the 1 second one is pushing it). I take it from the OTHER response times that you aren't just on a crappy connection, so that seems really weird. My gut is telling me this has something to do with how long those requests are taking though... I'm just not sure what could be causing that.

Edit: I'll say this though, how spread out that one log you have with the two requests is... the steps being spread out the way they are doesn't show me a single type of operation blocking... Rather it looks like the whole system was chugging or something for a few seconds. If it was purely IO related I think I'd expect there to be a single jump in time right around the tokens.json file write operations, but for everything else to be quick... but it looks like the actual request processing as a whole there just went really slow.

i8beef commented 4 years ago

I put out another version with a few changes that could help you

  1. Upgraded the IdentityServer for any upstream bugfixes. They actually DID change how RefreshTokens work in the new version, where they don't automatically delete them on use now, they just mark them as used and let the token cleanup service handle removal at a later time.
  2. That added the ability to allow refresh token reuse within a specified timeframe, which I've added and allowed a 30 second timeout on. This should allow concurrent requests within that window to still complete with new refresh tokens. That 30 seconds could be configurable, but I'll start there for now and see if it helps you...
  3. In addition, I added a config value that will let you set the refresh token for permanent reuse. This SHOULD let Google eventually figure their shit out with a valid token, at the expense of some idealized security around token reuse, but that is probably acceptable for home use stuff if you want to forgo that extra layer of security.
{
  oauth: {
    clients: [
      {
        refreshTokenReuse: true
      }
    ]
  }
}

The downside to their change here is I'm not sure how many tokens you'll end up with in the tokens.json file now, as if you have a long refresh token expiration and google uses it to get 3 or 4 NEW refresh tokens that it'll never use, those will stick around until they expire... we'll have to see how that plays out.

i8beef commented 4 years ago

I updated a little bit.

There is now a oauth.refreshTokenGracePeriod setting you can set which will default to 0, or essentially off. You can set this to a higher value like 30 seconds or something so that multiple token requests within that time period will all succeed. In theory, Google's LAST request that succeeds will have the new refresh token pair. So if you want to try that, then you have to set this setting. This is option 1.

The client refreshTokenReuse is still an option too, but it would basically make that refresh token valid for its ENTIRE lifetime... it wouldn't SOLVE the issue, but would move it to the expiration time of the actual refresh token instead of USE of that token. This basically turns the refresh token into a long lived API key, which will still expire after a year or however long you make it (you could make it much longer in this case). This is option 2.

The repercussions of this though might still be felt in commands failing randomly due to auth failures rather than the refresh token requests failing. You wouldn't lose your account linkage, but individual commands running in parallel that require a refresh token pre-flight could each get different access tokens, and then fail in making the actual command call instead... have to see what you get out of it here. I'm very much doing stuff I'm not SUPPOSED to allow you to do here, and I actually think this is more Google's problem not managing token lifetimes correctly (it just doesn't come up often for their primary concern, third party vendors, because they have more reliable response times, and we're kind of not their highest priority).