serverless / serverless-kubeless

This plugin enables support for Kubeless within the Serverless Framework.
Apache License 2.0
303 stars 80 forks source link

Frequent and inconsistent token expired error #168

Closed mpalumbo7 closed 5 years ago

mpalumbo7 commented 5 years ago

Summary

In about 9 out of 10 times, a serverless command results in a The access token has expired. Make sure you can access your cluster and try again. When I just run the same serverless deploy -v over and over again, eventually it works.

I'm trying to encourage my team to use serverless + kubeless, but this error is preventing me from being productive. This may be related to #89, but running a kubectl command to refresh the token doesn't seem to have any affect. My only option is to submit the same command again and again...

Error

$ SLS_DEBUG="*" serverless deploy -v
Serverless: Load command config
Serverless: Load command config:credentials
Serverless: Load command create
Serverless: Load command install
Serverless: Load command package
Serverless: Load command deploy
Serverless: Load command deploy:function
Serverless: Load command deploy:list
Serverless: Load command deploy:list:functions
Serverless: Load command invoke
Serverless: Load command invoke:local
Serverless: Load command info
Serverless: Load command logs
Serverless: Load command metrics
Serverless: Load command print
Serverless: Load command remove
Serverless: Load command rollback
Serverless: Load command rollback:function
Serverless: Load command slstats
Serverless: Load command plugin
Serverless: Load command plugin
Serverless: Load command plugin:install
Serverless: Load command plugin
Serverless: Load command plugin:uninstall
Serverless: Load command plugin
Serverless: Load command plugin:list
Serverless: Load command plugin
Serverless: Load command plugin:search
Serverless: Load command config
Serverless: Load command config:credentials
Serverless: Load command info
Serverless: Load command logs
Serverless: Invoke deploy
Serverless: Invoke package
Serverless: Packaging service...
Serverless: Excluding development dependencies...

  Error --------------------------------------------------

  The access token has expired. Make sure you can access your cluster and try again

     For debugging logs, run again after setting the "SLS_DEBUG=*" environment variable.

  Stack Trace --------------------------------------------

Error: The access token has expired. Make sure you can access your cluster and try again
    at getToken (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/helpers.js:113:13)
    at Object.getConnectionOptions (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/helpers.js:147:17)
    at new CRD (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/crd.js:30:17)
    at deployFunction (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/deploy.js:416:24)
    at _.each (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/deploy.js:555:9)
    at arrayEach (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/lodash/lodash.js:516:11)
    at Function.forEach (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/lodash/lodash.js:9344:14)
    at BbPromise (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/deploy.js:552:7)
    at deploy (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/deploy.js:551:10)
    at BbPromise.then (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/deploy/kubelessDeploy.js:144:19)
From previous event:
    at PluginManager.invoke (/home/mpalumbo7/.npm-global/lib/node_modules/serverless/lib/classes/PluginManager.js:391:22)
    at PluginManager.run (/home/mpalumbo7/.npm-global/lib/node_modules/serverless/lib/classes/PluginManager.js:422:17)
    at variables.populateService.then.then (/home/mpalumbo7/.npm-global/lib/node_modules/serverless/lib/Serverless.js:111:33)
    at processImmediate (timers.js:632:19)
    at process.topLevelDomainCallback (domain.js:120:23)
From previous event:
    at Serverless.run (/home/mpalumbo7/.npm-global/lib/node_modules/serverless/lib/Serverless.js:98:6)
    at serverless.init.then (/home/mpalumbo7/.npm-global/lib/node_modules/serverless/bin/serverless:43:28)

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information -----------------------------
     OS:                     linux
     Node Version:           11.6.0
     Serverless Version:     1.36.3

Thoughts

Looking at the code in ./lib/helpers.js I wonder if it has something to do with using the azure auth-provider?

Line 111 begins:

 // Access tokens may expire so we better check the expire date
    const expiry = moment(userInfo.user['auth-provider'].config.expiry);
    if (expiry < moment()) {
      throw new Error(
        'The access token has expired. Make sure you can access your cluster and try again'
      );
    }

But my redacted kubeconfig shows that I don't have a expiry field?

users:
- name: my-cluster-admin
  user:
    auth-provider:
      config:
        access-token: <access-token>
        apiserver-id: <apiserver-id>
        client-id: <client-id>
        expires-in: "3600"
        expires-on: "1548412445"
        refresh-token: <token>
        tenant-id: <tenant-id>
      name: azure
mpalumbo7 commented 5 years ago

Also, I confirmed that my local time and server time are synchornized within a second.

andresmgot commented 5 years ago

mmh, if expiry is undefined it should not throw the error:

> const expiry = undefined
undefined
> expiry < moment()
false

Maybe the format of expiry using azure doesn't get properly parsed and cannot be compared with moment(). Can you add a console.log before that line to see what does it contain?

mpalumbo7 commented 5 years ago

It appears to be a race condition. When I added a console.log() it never worked.

The value of userInfo.user['auth-provider'].config.expiry is undefined so then moment just returns the current time. When expiry < moment() it is the same as moment() < moment() where the first argument was just ran a millisecond earlier.

I changed the following and then ran it a bunch to confirm.

    // Access tokens may expire so we better check the expire date
    const expiry = moment(userInfo.user['auth-provider'].config.expiry);

+   const rightnow = moment();
+   console.log(`Is ${expiry.format('HH:mm:ss.SSS')} < ${rightnow.format('HH:mm:ss.SSS')}? ${expiry < rightnow}`);

-   if (expiry < moment()) {    
+   if (expiry < rightnow) {
      throw new Error(
        'The access token has expired. Make sure you can access your cluster and try again'
      );
    }
    return accessToken;

A case where it failed halfway through the deployment:

$ serverless deploy -v
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Is 17:34:27.279 < 17:34:27.279? false
Is 17:34:27.854 < 17:34:27.854? false
Is 17:34:27.858 < 17:34:27.858? false
Is 17:34:28.790 < 17:34:28.790? false
Is 17:34:28.934 < 17:34:28.934? false
Serverless: Function capitalize successfully deployed
Serverless: Function hello successfully deployed
Is 17:34:33.437 < 17:34:33.438? true

  Error --------------------------------------------------

  The access token has expired. Make sure you can access your cluster and try again

     For debugging logs, run again after setting the "SLS_DEBUG=*" environment variable.

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information -----------------------------
     OS:                     linux
     Node Version:           11.6.0
     Serverless Version:     1.36.3

It works until the sixth call to getToken() when a millisecond difference causes the error:

Is 17:34:33.437 < 17:34:33.438? true

mpalumbo7 commented 5 years ago

I could just submit a PR that puts an extra check for expires-on field, but what about for other providers in client-go? Currently, it supports:

    // Initialize all known client auth plugins.
    _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
    _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
    _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
    _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"

In user.auth-provider.config:

Does the getToken() funciton have to check for token expiration? Why not just retrieve the token and try using it. If the call to k8s API fails, then shouldn't there be a decent error message telling the user why?

mpalumbo7 commented 5 years ago

Unfortunately, when I comment out the whole expirty check, the downstream code does not handle the failed API call well.

$ serverless deploy -v
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Invalid date
/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/config.js:56
      return JSON.parse(this.configMag.data[key]);
                                           ^

TypeError: Cannot read property 'runtime-images' of undefined
    at Config.get (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/config.js:56:44)
    at Object.getRuntimeDepfile (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/lib/helpers.js:187:34)
    at ReadStream.s.on (/home/mpalumbo7/code/example-repo/functions/hello-world/node_modules/serverless-kubeless/deploy/kubelessDeploy.js:104:39)
    at ReadStream.emit (events.js:193:15)
    at ReadStream.EventEmitter.emit (domain.js:441:20)
    at endReadableNT (_stream_readable.js:1129:12)
    at process.internalTickCallback (internal/process/next_tick.js:72:19)

The logic in serverless-kubeless/deploy/kubelessDeploy.js is not handling the failed k8s API call correctly in this case. The connection is a success but it fails Authorization. So, in deployFunction() it hits s.on('end', () => {} instead of hitting s.on('error', (err) => reject(err));

The successful connection returns the following message:

{ 
    kind: 'Status',
     apiVersion: 'v1',
     metadata: {},
     status: 'Failure',
     message: 'Unauthorized',
     reason: 'Unauthorized',
     code: 401 
}

Which causes an invalid object reference error downstream. So, I added some logic to check for failed messages, and it appears to work:

s.on('end', () => {
+   // Check if call was successful, but unauthorized
+   if (kubelessConfig.configMag['status'] == 'Failure') {
+       reject(`API call failed with:\n${JSON.stringify(kubelessConfig.configMag)}.\nCheck that your token is not expired.`)
+       return;
+   }

The result is:

$ serverless deploy -v
Serverless: Packaging service...
Serverless: Excluding development dependencies...

  Error --------------------------------------------------

    API call failed with:
    {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}. 
    Check that your token is not expired.

     For debugging logs, run again after setting the "SLS_DEBUG=*" environment variable.

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information -----------------------------
     OS:                     linux
     Node Version:           11.6.0
     Serverless Version:     1.36.3

With the improved logic and logging, we can remove the expiry logic from the getToken() function:

function getToken(userInfo) {
-const token = _.get(userInfo, 'user.token') ||
-    _.get(userInfo, 'user.auth-provider.config.id-token');
-  const accessToken = _.get(userInfo, 'user.auth-provider.config.access-token');
-  if (token) {
-    return token;
-  } else if (accessToken) {
-    // Access tokens may expire so we better check the expire date
-    const expiry = moment(userInfo.user['auth-provider'].config.expiry);
-    if (expiry < moment()) {
-      throw new Error(
-        'The access token has expired. Make sure you can access your cluster and try again'
-      );
-    }
-    return accessToken;
-  }
-  return null;
+  return _.get(userInfo, 'user.token') ||
+    _.get(userInfo, 'user.auth-provider.config.id-token') ||
+    _.get(userInfo, 'user.auth-provider.config.access-token') || null;
}

I'm working on a pull request to implement this new logic. Any thoughts?

andresmgot commented 5 years ago

So it seems that we are dealing with two problems. Regarding the missing expiry key in the configuration I would just omit the validation if the expiry info is not available:

+ if(userInfo.user['auth-provider'].config.expiry) {
    // Access tokens may expire so we better check the expire date
    const expiry = moment(userInfo.user['auth-provider'].config.expiry);
    if (expiry < moment()) {
      throw new Error(
        'The access token has expired. Make sure you can access your cluster and try again'
      );
    }
+ }

Then, the second problem is that we are not checking if the ConfigMap that we receive is a valid ConfigMap. To check that I would use the code instead of the status. In the file config.js when we are retrieving the resource we should throw an error if the API returns a failure:

            .on('end', () => {
-              const res = Buffer.concat(data).toString();
-              this.configMag = JSON.parse(res);
-              resolve();
+              const res = JSON.parse(Buffer.concat(data).toString());
+              if(res.code && res.code !== 200) {
+                reject(`Unable to parse ConfigMap. Received ${res.message}`)
+              } else {
+                this.configMap = res;
+                resolve();
+              }
            });

What is not clear to me is if you are still finding the issue that the token expires while you are deploying your functions. Is that still happening after removing the check?

mpalumbo7 commented 5 years ago

Opened PR #171 with your suggested changes. It works on my environment both with and without an expired token.

The token was valid during the whole deployment, it was just the const expiry = moment(undefined) followed by expiry < moment() would occassionaly return true when there was more than millisecond delay between executing lines.