googleapis / google-auth-library-nodejs

🔑 Google Auth Library for Node.js
Apache License 2.0
1.72k stars 377 forks source link

Google Cloud Function - Error: Could not load the default credentials. #798

Closed timhj closed 4 years ago

timhj commented 5 years ago

This has only recently started happening for me. Some cloud functions which are using the Cloud Vision API have started failing due to an Auth error. The failures seem random, with requests working sometimes and other times not. As there is no explicit Auth happening (it's the Node JS GCF runtime for an existing project, it's not clear what could be the issue).

Errors look like this:

A 2019-10-01T03:17:00.907Z PROD-PDF-XXXX 718747420330136 
    Unhandled rejection PROD-PDF-XXXX 718747420330136
E 2019-10-01T03:17:00.910Z PROD-PDF-XXXX 718747420330136 
    Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information.
    at GoogleAuth.getApplicationDefaultAsync (/srv/functions/node_modules/google-auth-library/build/src/auth/googleauth.js:161:19)
    at process._tickCallback (internal/process/next_tick.js:68:7) PROD-PDF-XXXX 718747420330136 

Triggering code is:

if(statusUpdateResult.affectedRows < 1){
        // to avoid race conditions from concurrent competing threads, lets make sure this thread 
        // was the one that updated the status first
        let concurrentThreadError = Error('Not the first to update status to processing so exiting');
        console.error(concurrentThreadError);
        return concurrentThreadError;
      } else {
        // status updated and ready for processing
        console.log(`Updated pdf id ${pdfRecord.pdf_id} to queuing_vision.`);
      }

      // queue the file for processing by vision API
      const gcsSourceUri = `gs://${bucketName}/${fileName}`;
      const gcsDestinationUri = `gs://${ocrJSONBucket}/${fileName}.json`;

      const inputConfig = {
        // Supported mime_types are: 'application/pdf' and 'image/tiff'
        mimeType: 'application/pdf',
        gcsSource: {
          uri: gcsSourceUri,
        },
      };
      const outputConfig = {
        gcsDestination: {
          uri: gcsDestinationUri,
        },
      };
      const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
      const request = {
        requests: [
          {
            inputConfig: inputConfig,
            features: features,
            outputConfig: outputConfig,
          },
        ],
      };
      console.log(processingOperation);
      var processingOperation;
      try {
        processingOperation = visionClient.asyncBatchAnnotateFiles(request); 
      } catch(processingError) {
        console.error(processingError);
        return processingError;
      }
      console.log(processingOperation);

The unhandled rejection is happening inside the Vision request try/catch block so there's nowhere further to debug for me, hope someone can help or is getting the same issue. This used to work without issue.

Environment details

Steps to reproduce

timhj commented 5 years ago

This issue is intermittent and because it's not catchable, it's resulting in 'out-of-whack' errors in the database. Without deploying anything differently, the code now 'just works' on the same retries.

timhj commented 5 years ago

The vision API request may have been failing behind the scenes and causing this issue from a request to output json into a non-existent GCS Bucket... As the issue is intermittent, I'm not sure. So will watch an see what happens.

dusty commented 5 years ago

I'm having the same problem with various libraries inside cloud functions.

This example is listening for a bucket onFinalize event and sending a single http task to a cloud tasks queue with the payload.

It seems that my function works fine when its first deployed. Then after some time, (perhaps) after it scales to zero and then is re-triggered it always fails until I redeploy it.

Screen Shot 2019-10-05 at 8 18 47 AM

I'm using the nodejs10 runtime with "@google-cloud/tasks": "^1.4.0".

Code

const { v2beta3 } = require('@google-cloud/tasks')
const client = new v2beta3.CloudTasksClient()

const queue = body => {
  return client.createTask({
    parent: process.env.QUEUE_URL,
    task: {
      httpRequest: {
        httpMethod: 'POST',
        url: process.env.TASK_URL,
        headers: { 'Content-Type': 'application/json' },
        body: Buffer.from(JSON.stringify(body)),
        oidcToken: { serviceAccountEmail: process.env.SERVICE_ACCOUNT_EMAIL }
      }
    }
  })
}

exports.default = async file => {
  await queue(file)
  console.info(`DONE: ${file.name} queued`)
}
merlinnot commented 5 years ago

I also experience this issue for quite some time now. It's happening in my project all the time, let me know if I can be of any help to debug it.

bcoe commented 5 years ago

@merlinnot what type of authentication are you using in your project, and what APIs specifically.

merlinnot commented 5 years ago

Firestore, BigQuery, Debugger, ...

Given the stack traces, the error observed originates here: https://github.com/googleapis/google-auth-library-nodejs/blob/944e2aa62a61c253ba153f49590d7416585c64eb/src/auth/googleauth.ts#L291-L296

As you can see, it is thrown if and only if the value of isGCE variable is falsy. The value is a result of a call of _checkIsGCE function: https://github.com/googleapis/google-auth-library-nodejs/blob/944e2aa62a61c253ba153f49590d7416585c64eb/src/auth/googleauth.ts#L311-L316

This function in turn calls isAvailable function from google-metadata library: https://github.com/googleapis/gcp-metadata/blob/25bc11657001cb6b3807543377d74bafe126ea62/src/index.ts#L121-L142

As you can see, it depends on metadataAccessor function: https://github.com/googleapis/gcp-metadata/blob/25bc11657001cb6b3807543377d74bafe126ea62/src/index.ts#L49

This function makes an HTTP request to http://169.254.169.254/computeMetadata/v1/ here: https://github.com/googleapis/gcp-metadata/blob/25bc11657001cb6b3807543377d74bafe126ea62/src/index.ts#L66

I see no other way for this error to occur other than a requests to this service fail.

merlinnot commented 5 years ago

I'm currently redeploying all of the functions with additional logging enabled (DEBUG_AUTH). Will post here as soon as I have a hit.

merlinnot commented 5 years ago

In the last 24 hrs I had 71,092 occurrences of this error, but it was last seen 5 hrs ago... I thought I'll be able to provide you more information straight away, this error used to happen all the time.

bcoe commented 5 years ago

@merlinnot as you noticed, I've deployed a version of gcp-metadata with a debug option. I'd double check that your package-lock.json has gcp-metadata@3.2.0, at which point we should get a better picture of what's happening the next time you run into issues.

davedc commented 5 years ago

@bcoe Experiencing the same issue on a few functions on our side.

I can confirm @merlinnot's suspicion that requests to the metadata service is failing.

Screen Shot 2019-10-11 at 6 22 04 pm
merlinnot commented 5 years ago

It's back :)

{ FetchError: network timeout at: http://metadata.google.internal./computeMetadata/v1/instance
at Timeout.<anonymous> (/srv/functions/node_modules/node-fetch/lib/index.js:1448:13)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at listOnTimeout (timers.js:263:5)
at Timer.processTimers (timers.js:223:10)
message:
'network timeout at: http://metadata.google.internal./computeMetadata/v1/instance',
type: 'request-timeout',
config:
{ url:
'http://metadata.google.internal./computeMetadata/v1/instance',
headers: { 'Metadata-Flavor': 'Google' },
retryConfig:
{ noResponseRetries: 0,
currentRetryAttempt: 0,
retry: 3,
retryDelay: 100,
httpMethodsToRetry: [Array],
statusCodesToRetry: [Array] },
responseType: 'text',
timeout: 3000,
params: [Object: null prototype] {},
paramsSerializer: [Function: paramsSerializer],
validateStatus: [Function: validateStatus],
method: 'GET' } }

Here's a timeline for the last 30 days:

Screen Shot 2019-10-11 at 11 38 13

And for the last 7 days:

Screen Shot 2019-10-11 at 11 38 25
BluebambooSRL commented 5 years ago

Same problem...

bcoe commented 5 years ago

@merlinnot @BluebambooSRL thank you, this gives us some valuable forensic information for the engineering team :+1:

smashah commented 5 years ago

Any remedy for this? I'm experiencing it in same circumstance (vision API in GCF (Node v8))

Edit: Odd thing is that I didn't change any deps or code. Just ran firebase deploy to update some unrelated code and then it started happening.

merlinnot commented 5 years ago

I think it's not related to dependencies. In my case redeployments also change the behavior of these errors: sometimes I have more, sometimes I have less (see the chart above), where the number of executions per day is rather stable.

I wild guess would be that it just depends on which node in the underlying infrastructure the code lands? Maybe a re-re-deploying would help in your case?

smashah commented 5 years ago

@merlinnot Yes I just redeployed it with ^@google-cloud/vision@1.5.0 (before it was 1.4.0) and it started working again.

davedc commented 5 years ago

Sadly, at europe-west2 seems like all underlying nodes have this issue? Redeploying the function a bunch of times has not really alleviated things for us.

antonioallen commented 5 years ago

Unfortunately, I'm running into this issue as well. It's happening pretty consistently for me at the moment. It just started after a recent full deploy of all my cloud functions. I'm receiving Error: Could not load the default credentials. followed by Unhandled error Error: Can't set headers after they are sent.(mostly likely from "Ignoring exception from a finished function" ). A bit out of ideas on this one. I'll keep poking at it.

For me it's happening at: GoogleAuth.getApplicationDefaultAsync (/srv/node_modules/@google-cloud/logging/node_modules/google-auth-library/build/src/auth/googleauth.js:161:19)

Screen Shot 2019-10-19 at 9 26 47 AM Screen Shot 2019-10-19 at 9 28 47 AM
antonioallen commented 5 years ago

Commenting out all logging logger.debug() within the functions seems to fix the issue for me. But... no logs. Wonder why the auth is failing for it now.

edi commented 5 years ago

Same here. I have more apps .. none of them are failing with this error.

But one of them ( after recently upgrading my functions dependencies ), started resulting in the same error as above.

Weird thing is, given there are two functions, only one triggers this error.

I have no outside libraries or network requests, no APIs being used, simple firestore document triggers and updates.

So while I’m using the latest version of everything, only one function out of the two is randomly failing.

Had 11 fails during past 5 days. My client is losing revenue based on those fails though, so it’s a bit worrying.

bcoe commented 5 years ago

@ollydixon this thread is specifically related to authentication issues with cloud functions, which I think is potentially related to something specifically happening within this environment.

Could I bother you to open a new issue, with more specifics about the environment you're running in, and the steps you're using to bootstrap your application.

bcoe commented 5 years ago

@edi, @davedc, @smasha :wave: sorry about your frustration, I've raised an internal issue with the Cloud Functions folks (which is why this is labeled external), and am going to follow up again today.

bcoe commented 5 years ago

There's an internal issue with the GCF folks that has been updated throughout the day, we're trying to get the root cause of the timeouts that occur attempting to connect to the metadata server (this is what's in turn resulting in the credentials issue).

A potential workaround for folks, would be to create a service account, rather than relying on the default credentials:

https://cloud.google.com/docs/authentication/getting-started

:point_up: this requires that the credentials are available in a file on disk, so you would need to either use the API to deploy your project, or use the file upload option (rather than the inline editor).

At which point you would set your credentials to GOOGLE_APPLICATION_CREDENTIALS=./my-service-account.son.

I understand this workaround is suboptimal, and we are continuing to dig into things on our end.

Dara-To commented 5 years ago

I have the same problem on my new MacBook. I am able to run my functions locally on my old MacBook and iMac at the office. All using the same Firebase Functions project. So the default credential works on admin.initializeApp. I don't want to change the initialization code to use the workaround for 1 device. I will wait for the permanent fix. Please let us know when it will be fixed. Thanks.

timhj commented 5 years ago

Sounds like it might be a different issue to this one. I'd make sure you've got the appropriate environment variable set with the location of your json credentials first.

There's no implicit default credentials on a new Mac.

On Sun, Oct 27, 2019, 5:02 PM Dara-To notifications@github.com wrote:

I have the same problem on my new MacBook. I am able to run my functions locally on my old MacBook and iMac at the office. All using the same Firebase Functions project. So the default credential works on admin.initializeApp. I don't want to change the initialization code to use the workaround for 1 device. I will wait for the permanent fix. Please let us know when it will be fixed. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/googleapis/google-auth-library-nodejs/issues/798?email_source=notifications&email_token=AA5XHE63WRHJIRREZ7GUMWLQQUVIRA5CNFSM4I4ECSIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKXCFI#issuecomment-546664725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5XHE3AXK7DC7VDS6AJFY3QQUVIRANCNFSM4I4ECSIA .

bcoe commented 5 years ago

@Dara-To I believe @timhj is correct, you will want to go through the steps outlined here:

https://cloud.google.com/docs/authentication/getting-started

To setup authentication on your new laptop; if you bump into issues, please feel free to open up an issue here.

neelance commented 4 years ago

I just saw the same error with one of our webhook functions after redeploying it. Special about this function was that it sent the response early with resp.sendStatus(202); and did the heavy lifting afterwards. This caused the function to already log Function execution took X ms, finished with status code: 202 early. My guess is that the function then tried to fetch the default credentials, but couldn't get them because the execution was already marked as finished.

I am now writing the webhook's payload into the database before returning 202 and then using a separate cloud function to process it asynchronously. This made the error go away.

Tebro commented 4 years ago

This started happening to one of my functions now. Is there any news regarding a fix? We are using Cloud Build for function deployment and getting a json file embedded into the build without having it in git is non-trivial.

bcoe commented 4 years ago

@Tebro I see a few internal threads regarding timeout issues on the metadata server, there was one rollout today, but it looks like the issue is still periodically happening.

I will continue to keep this thread updated.

Note, the workaround of specify an explicit service account, using GOOGLE_APPLICATION_CREDENTIALS should do the trick for the time being as a workaround.

edi commented 4 years ago

@aldobaie Not the case.

It's just a request timeout, when it happens. Seems like the SDK can't each one of the credentials endpoints ( eg. 169.254.169.254 ).

Given the fact that it happens randomly ( for the same code snippets ) means it's not code-oriented.

The workaround (until they sort it) is to load your service account credentials manually like this:

const credentials = require('./credentials.json')
admin.initializeApp({
    credential: admin.credential.cert(credentials),
    databaseURL: 'https://PROJECT_ID.firebaseio.com'
})
Dara-To commented 4 years ago

I think I got it. I installed the new version of the Firebase CLI firebase-tools on my new laptop whereas my iMac is still on an older version. I will confirm the version on my iMac when I get back to the office. I believe pass a certain version, they have removed the default credential so that admin.initializeApp(functions.config().firebase) wouldn't work anymore.

My next question is which service account should I generate a key for? I am so confused with the many options stated in the documentations. My backend project is only for cloud functions in node.js, no hosting, and I want to access the auth service. Any clarifications would be appreciated.

1. Firebase service account



In the Firebase console, you could generate a key for the Firebase Admin SDK. 

 https://console.firebase.google.com/project//settings/serviceaccounts/adminsdk



2. App Engine default service account

According to this doc, 
it says to create a key for the App Engine default service account.


https://firebase.google.com/docs/functions/local-emulator

https://console.cloud.google.com/iam-admin/serviceaccounts?project=&authuser=0




3. New service account



According to this doc (provided by an earlier message in this thread), it says to set up a new service account and set the role as the project owner.

 https://cloud.google.com/docs/authentication/getting-started#auth-cloud-implicit-nodejs

bcoe commented 4 years ago

@timhj @aldobaie @Tebro there have been some stability fixes deployed internally, are you continuing to see these issues?

bcoe commented 4 years ago

@Dara-To I'm glad you've made some progress :+1: could I bother you to open a new issue with your questions, this thread is related to specific issues were were seeing with cloud functions not loading default credentials (I don't want to lose your questions in the shuffle).

Dara-To commented 4 years ago

Sure thing, I can email Firebase support. Thanks

bcoe commented 4 years ago

@Dara-To happy to have a tracking issue here too; starting a conversation with Firebase is probably also worthwhile, given they'll have more specific expertise.

timhj commented 4 years ago

@bcoe - Still happens intermittently but not frequently - 4 instances in the past 24hrs, the tight grouping of errors within microseconds of each other agree with it being a quick connection dropout issue.

How recently a redeploy does the fix need to have been included?

NawarA commented 4 years ago

Definitely experiencing this, particularly when I have about >1,400 instances active, seems some sputter and die with this error

bcoe commented 4 years ago

@timhj the deploy I believe was in the past week, mind sharing with me the project identifier (feel free to send it by email bencoe [at] google.com.

It sounds promising that this error has become a rare occurrence, makes me think the upstream issue has at least been partially addressed.

Tebro commented 4 years ago

Have not seen it in a while, but the project does not have high activity at this point.

Edit: Scratch that, just did some tests and got it again.

seriousManual commented 4 years ago

Just to add on this, we're seeing this error as well in the context of logging-winston (https://www.npmjs.com/package/@google-cloud/logging-winston).

This happens almost everytime the cloud function is redeployed (e.g. after an update). The function is running but logging does not work. I'm writing almost here, because from time to time it is actually working, but not consistently.

smashah commented 4 years ago

Just for context, I deployed my functions - it did not work. I deployed them again and it's been working ever since. Very odd.

zoharsanchez commented 4 years ago

Redeploying works, but it's trail and error. We get the error, redeploy and check that. We have a credential file, but it is encrypted in Storage and we decrypt it when the function is run. Really odd behavior indeed.

seriousManual commented 4 years ago

Note, the workaround of specify an explicit service account, using GOOGLE_APPLICATION_CREDENTIALS should do the trick for the time being as a workaround.

Hi @bcoe would that be an env variable at deploytime or a file in the root of the function? Thanks!

arfnj commented 4 years ago

Note, the workaround of specify an explicit service account, using GOOGLE_APPLICATION_CREDENTIALS should do the trick for the time being as a workaround.

Hi @bcoe would that be an env variable at deploytime or a file in the root of the function? Thanks!

@seriousManual I used this workaround and haven't had any errors since. I uploaded the secret in the root of the function and then declared the variable at the start of my code:

process.env.GOOGLE_APPLICATION_CREDENTIALS = './my-secret.json';

NawarA commented 4 years ago

Good workaround!

Though the good folks on the team need to dig into the root cause since their software is inconsistent at scale, which means there's a deeper root cause to discover and patch :)

runelk commented 4 years ago

We had the same issue in our Cloud Functions in a Firebase project, but the workaround proposed by @arfnj seems to fix it. Are there any potential security issues to take note of when doing it this way?

arfnj commented 4 years ago

We had the same issue in our Cloud Functions in a Firebase project, but the workaround proposed by @arfnj seems to fix it. Are there any potential security issues to take note of when doing it this way?

I appreciate the kudos, @NawarA and @runelk, but that's not my workaround! The esteemed @bcoe suggested it on October 21st and I was just sharing my implementation of it to help answer @seriousManual's question. 🙂

latifs commented 4 years ago

@bcoe , not sure if you can answer this but am thinking about implementing this option too.

1) How would you dynamically switch the secret.json file if you had a prod and dev environment?

2) Do you need to add process.env.GOOGLE_APPLICATION_CREDENTIALS = './my-secret.json'; at the top of every single function?

3) And what role does the service account key need to have when creating it?

Thanks

Update: 1) I tried to save both secret files in a creds folder and access them in this fashion: process.env.GOOGLE_APPLICATION_CREDENTIALS = ./creds/${process.env.GCP_PROJECT}-secret.json; which seemed like a good idea but I am on Node 10 runtime and because this runtime removed access to GCP_PROJECT variable, it is not an option.

Dara-To commented 4 years ago

I tried to use the workaround on my new MPB laptop with setting export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" on my shell session as suggested by @bcoe but it didn't work. I even updated all the firebase dependencies. Why does this have to be so complicated?

Is it because of this message on my terminal?

The default interactive shell is now zsh. To update your account to use zsh, please runchsh -s /bin/zsh. For more details, please visit https://support.apple.com/kb/HT208050.

bcoe commented 4 years ago

@Dara-To another option for you would be installing the gcloud command line tool, running gcloud auth login (to login to your account), and then also making sure you set a default project ID, with gcloud config set project.

You can also run your application like so:

GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" node my-app.js

If you continue to have issues please feel free to open a separate issue in this library, or to reach out to support for the specific product you're using 👍

Your issue is I believe unrelated to this thread however, which relates to folks running applications in a GCP environment (not on their local machine).