googleapis / nodejs-tasks

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
https://cloud.google.com/tasks/docs/
Apache License 2.0
142 stars 45 forks source link

Multiple request at same time alway throw this error - "4 DEADLINE_EXCEEDED: Deadline exceeded" #397

Closed xesunny closed 3 years ago

xesunny commented 4 years ago

Environment details

Steps to reproduce

  1. I create a cloud task when a user perform certain activity on my app.
  2. If I get 20-30 users doing activity at the same time, I get "DEADLINE_EXCEEDED" error.
if (!client) {
  let { v2beta3 } = require('@google-cloud/tasks');
  client = new v2beta3.CloudTasksClient();
  parent = client.queuePath(process.env.GOOGLE_CLOUD_PROJECT, process.env.QUEUE_LOCATION, task_queue_name);
}
client.createTask({ parent, task });

Thanks!

bcoe commented 4 years ago

@xesunny is this in a cloud function? I think you instead:

await client.createTask({parent, task});

Otherwise the cloud function retracts resources once the request is served, and you may get odd behavior.

xesunny commented 4 years ago

image Here is my exact source code. Here is the error: image

xesunny commented 4 years ago

@xesunny is this in a cloud function? I think you instead:

await client.createTask({parent, task});

Otherwise the cloud function retracts resources once the request is served, and you may get odd behavior.

I'm using this code in (A) CF (B) Server side application (traditional)

I face this issue on both side. Temporary solution is : I have to submit my task sequentially. For example: If I have 10 tasks to be submitted, I do them one by one.

Avg. time to submit 1 task is 200ms to 500ms. This cause unnecessary delay of ~5 seconds.

Because of this single issue, I can't use cloud task a lot. Though I love the product.

isaiah-coleman commented 4 years ago

I love tasks, but having to create tasks sequentially makes it unusable for me. I need to schedule too many tasks. I've been batching promises and waiting for them to resolve, but I still get the error. Perhaps I can tweak the batch size some, but if it becomes too slow I'll be forced to look for another solution :(

yossi-eynav commented 4 years ago

@bcoe - We're getting this error too. is there a limit for dispatching multiple tasks in concurrent?

bcoe commented 4 years ago

@yossi-eynav @isaiah-coleman I'm not sure what the hard limit is on concurrently creating tasks, but I wouldn't be shocked if one exists. My recommendation would be, if it's not reasonable to enqueue them sequentially, do something like this:

const WORK_SIZE = 16; // or 32, etc.
while (work.length) {
  const chunkOfWork = work.splice(0, WORK_SIZE);
  await Promiser.all(chunkOfWork.map((chunk) => {
    return enqueueTask(chunk);
  });
}

☝️ this allows you to manage the number of tasks enqueued concurrently, without doing so completely sequentially.

ChrisBeeson commented 4 years ago

I'm getting this error dispatching a single task.

bcoe commented 4 years ago

@ChrisBeeson are you in a similar position as @xesunny describes, in which you enqueue a single task, but there may be multiple users causing this action concurrently.

ChrisBeeson commented 4 years ago

@bcoe no there are no other users

bcoe commented 4 years ago

@ChrisBeeson to me this sounds like a different issue than @xesunny and others are describing, which seems to be related to concurrency limits. I think it might be worth opening a new issue, so that I can help debug (I believe something else is happening to you).

yossi-eynav commented 4 years ago

@bcoe thanks! I'm getting this error even when I changed the code do be sequential. where can I get more information about this error?

bcoe commented 4 years ago

@yossi-eynav could you please share an example of the code you are running, and let me know what environment you're running in, i.e., Cloud Functions?

@xesunny reading your original message, the error is happening on your local OSX machine?

yossi-eynav commented 4 years ago

@bcoe my code is similar to this sample - https://github.com/googleapis/nodejs-tasks/blob/master/samples/createHttpTask.js.

I'm running on GKE

ChrisBeeson commented 4 years ago

I had the same problem, I think it's because httpRequest.body is expecting bytes, not a string. Try changing: task.httpRequest.body = Buffer.from(payload).toString('base64'); to task.httpRequest.body = Buffer.from(payload);

yossi-eynav commented 4 years ago

@bcoe
it's related to @grpc/grpc-js

{ Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
    at Object.callErrorFromStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/call.js:30:26)
    at Object.onReceiveStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/client.js:174:52)
    at Object.onReceiveStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:340:141)
    at Object.onReceiveStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:303:181)
    at Http2CallStream.outputStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/call-stream.js:114:27)
    at Http2CallStream.maybeOutputStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/call-stream.js:153:22)
    at Http2CallStream.endCall (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/call-stream.js:140:18)
    at Http2CallStream.cancelWithStatus (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/call-stream.js:441:14)
    at Timeout.DeadlineFilter.timer.setTimeout (/Volumes/dev/playground/cloud-task-test/node_modules/@grpc/grpc-js/build/src/deadline-filter.js:59:28)
    at ontimeout (timers.js:436:11)
isaiah-coleman commented 4 years ago

@yossi-eynav if you look, a lot of @google-cloud libraries with a @grpc/grpc-js dependency have an open issue related to concurrency. The only thing I've read that helps people seems to be upgrading your Node version to v13. However, that hasn't fixed the issue for me.

see: https://github.com/grpc/grpc-node/issues/1158 it might help

yossi-eynav commented 4 years ago

@bcoe @isaiah-coleman I've set the flag fallback to be true and now I have 0 errors on creating 1000 tasks in concurrent..

it seems that the problem is with gax-nodejs library -> can you share the changes between the regular flow and the fallback flow since I'm not familiar with this library at all?

isaiah-coleman commented 4 years ago

@yossi-eynav I'm not sure I can help you. I'm not a contributor on the library, merely just sharing what seems to have helped others. Can you explain how you used the fallback flag to fix the errors? It might be a solution I can use.

alexander-fenster commented 4 years ago

@yossi-eynav Setting fallback to true enables a different transport (the one that was initially supposed for browsers) - instead of using gRPC, it serializes your requests and sends them over regular HTTP/1 connection with node-fetch to a different endpoint. When you enable fallback, you don't do any gRPC requests at all - it uses totally different stack. That might serve as a good workaround, but we'd like to investigate what's going on.

Would it be possible for you to measure the time - how long does it take for it to fail with DEADLINE EXCEEDED?

xesunny commented 4 years ago

@yossi-eynav could you please share an example of the code you are running, and let me know what environment you're running in, i.e., Cloud Functions?

@xesunny reading your original message, the error is happening on your local OSX machine?

Yes, It's happening on my Mac OSX machine but I have got this error on "Google Cloud instance" & "Google Cloud run" as well.

xesunny commented 4 years ago

@yossi-eynav Setting fallback to true enables a different transport (the one that was initially supposed for browsers) - instead of using gRPC, it serializes your requests and sends them over regular HTTP/1 connection with node-fetch to a different endpoint. When you enable fallback, you don't do any gRPC requests at all - it uses totally different stack. That might serve as a good workaround, but we'd like to investigate what's going on.

Would it be possible for you to measure the time - how long does it take for it to fail with DEADLINE EXCEEDED?

Can you give an example on how to use "fallback: true" ? I don't have much experience with gprc.

yossi-eynav commented 4 years ago

@alexander-fenster ~120s (created 1000 tasks in parallel)

yossi-eynav commented 4 years ago

@xesunny

const {CloudTasksClient} = require('@google-cloud/tasks');
const client = new CloudTasksClient({ fallback: true })
isaiah-coleman commented 4 years ago

Are the requests over HTTP/1 regional? Electing for { falback: true } gives me the following

error. {"code":3,"message":"Location 'us-east1' is not a valid location. Use ListLocations to list valid locations.","details":[]}

I know this is the correct region, when i switch out the region I receive the following:

{"code":3,"message":"Location must equal us-east1 because the App Engine app that is associated with this project is located in us-east1","details":[]}

For due diligence this is me checking the ListLocations

$ gcloud tasks locations list NAME FULL_NAME us-east1 projects/[project-name]/locations/us-east1

I'm grateful for any help that can be provided.

yossi-eynav commented 4 years ago

@isaiah-coleman i'm on us-center1

xesunny commented 4 years ago

am also getting same error : Error: {"code":3,"message":"Location 'asia-south1' is not a valid location. Use ListLocations to list valid locations.","details":[]}

yossi-eynav commented 4 years ago

@alexander-fenster @bcoe any update on this?

alexander-fenster commented 4 years ago

@yossi-eynav Not yet, it's on my list, please expect an update soon.

xesunny commented 4 years ago

Thanks team. Really looking forward for it.

ThisIsKareemG commented 4 years ago

I encountered this problem also and the workaround mentioned above worked. const client = new CloudTasksClient({ fallback: true })

Though still interested in what is originally causing this error. I only noticed it in the last few days.

`app:///../node_modules/@google-cloud/tasks/node_modules/@grpc/grpc-js/build/src/call.js in Object.callErrorFromStatus at line 30:26

shayne-lp commented 4 years ago

Hi there, I too am curious if there are any workarounds for us-east1, as the fallback approach seemingly cannot be used there. Thanks!

shayne-lp commented 4 years ago

Hi there, just wanted to circle back and see if anyone has made any progress on the above. In particular, how to use the fallback option in us-east1. Thanks!

shayne-lp commented 4 years ago

Hey @alexander-fenster are there any updates you can provide for non us-central users? Google support punted us back here :( this is really biting us in prod. Thanks

alexander-fenster commented 4 years ago

@shayne-lp Yes I'm working with support folks as well investigating what's going on here. Please expect to have an update later today or tomorrow. I'm sorry that it affects you folks. There's a lot of moving things there between the client library and the backend, I'l trying to understand what's going on.

gijo-varghese commented 4 years ago

I almost had the same error. In my case, I was not awaiting the call. Adding await or Promise.all() fixed the issue for me. Here is the detailed SO question: DEADLINE_EXCEEDED while adding tasks to Google Cloud Tasks

alexander-fenster commented 4 years ago

Wow thank you @gijo-varghese! Can you folks please check if this applies to your code?

shayne-lp commented 4 years ago

Hi @alexander-fenster, thanks for getting back to us. And thank you for your reply too, @gijo-varghese. We are indeed already using Promise.all(). After much trial and error, the issue we are seeing is the number of promises you are attempting to resolve dictates if you get the DEADLINE_EXCEEDED error or not. For example, if you wanted to do 10,000 promises, or even 1,000 promises in bulk, we would get the error. If we do batches of 100, we do not. The other problem is we are unable to catch this error and retry any that failed (it just blows us up).

Any guidance on the upper bounds there (whether dictated by this library, or Google Tasks itself) would be appreciated! Thanks all

xesunny commented 4 years ago

Still facing the issue.

gbhojak commented 4 years ago

I am seeing this issue as well even when I try to queue a single request with a Google Cloud Run service. This is what I am doing:

`
const request = {parent, task};

try {
    const [response] = await cloudTasksClient.createTask(request);
    console.error('Queued the task:', response.name);
} catch (e) {
    console.error('Error queuing the task', e);
}`

I am using "@google-cloud/tasks": "^2.0.0".

isaiah-coleman commented 4 years ago

Hi, I just wanted to circle back with an update. We tuned the number of promises in a batch down to 100 and used Promise.all() to wait for them to resolve. This quieted the error for a time, but we experienced the issue again today. There have been no changes to code or infrastructure.

We can continue to fine-tune batch sizes, but without any guidance it's hard to know if it will fix the issue long term. Thanks

bcoe commented 4 years ago

@isaiah-coleman @xesunny sorry that its been a little while since an update, we've been discussing this issue internally a bit.

Here's where I end up being a bit confused:


I've created a task queue as follows gcloud tasks queues create echo-function. I can then run the following code, and have yet to see a DEADLINE_EXCEEDED issue:

const {CloudTasksClient} = require('@google-cloud/tasks')

let client;
async function performTask () {
  if (!client) {
    client = new CloudTasksClient()
  }
  const queuePath = client.queuePath('long-door-651', 'us-central1', 'echo-function');
  const [resp] = await client.createTask({
    parent: queuePath,
    task: {
      httpRequest: {
        httpMethod: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        url: 'https://.../echo-function',
        body: Buffer.from(JSON.stringify({hello: 'world'})),
      },
    },
  });
  console.info('we finished creating task', resp);
}

for (let i = 0; i < 2048; i++) {
  performTask()
}

☝️ note that this is creating thousands of tasks concurrently.


Here are the exact versions of the dependencies I have installed:

bencoe@bencoe:~/google/tasks-perf$ npm ls
tasks-perf@1.0.0 /usr/local/google/home/bencoe/google/tasks-perf
└─┬ @google-cloud/tasks@2.1.0
  └─┬ google-gax@2.6.3
    ├─┬ @grpc/grpc-js@1.1.2
    │ └── semver@6.3.0 deduped
    ├─┬ @grpc/proto-loader@0.5.5
    │ ├── lodash.camelcase@4.3.0
    │ └── protobufjs@6.10.1 deduped
    ├── @types/long@4.0.1
    ├─┬ abort-controller@3.0.0
    │ └── event-target-shim@5.0.1
    ├─┬ duplexify@3.7.1
    │ ├─┬ end-of-stream@1.4.4
    │ │ └─┬ once@1.4.0
    │ │   └── wrappy@1.0.2
    │ ├── inherits@2.0.4
    │ ├─┬ readable-stream@2.3.7
    │ │ ├── core-util-is@1.0.2
    │ │ ├── inherits@2.0.4 deduped
    │ │ ├── isarray@1.0.0
    │ │ ├── process-nextick-args@2.0.1
    │ │ ├── safe-buffer@5.1.2
    │ │ ├─┬ string_decoder@1.1.1
    │ │ │ └── safe-buffer@5.1.2 deduped
    │ │ └── util-deprecate@1.0.2
    │ └── stream-shift@1.0.1
    ├─┬ google-auth-library@6.0.5
    │ ├── arrify@2.0.1
    │ ├── base64-js@1.3.1
    │ ├─┬ ecdsa-sig-formatter@1.0.11
    │ │ └── safe-buffer@5.1.2 deduped
    │ ├── fast-text-encoding@1.0.3
    │ ├─┬ gaxios@3.0.4
    │ │ ├── abort-controller@3.0.0 deduped
    │ │ ├── extend@3.0.2
    │ │ ├─┬ https-proxy-agent@5.0.0
    │ │ │ ├─┬ agent-base@6.0.1
    │ │ │ │ └── debug@4.1.1 deduped
    │ │ │ └── debug@4.1.1 deduped
    │ │ ├── is-stream@2.0.0
    │ │ └── node-fetch@2.6.0 deduped
    │ ├─┬ gcp-metadata@4.1.4
    │ │ ├── gaxios@3.0.4 deduped
    │ │ └─┬ json-bigint@1.0.0
    │ │   └── bignumber.js@9.0.0
    │ ├─┬ gtoken@5.0.2
    │ │ ├── gaxios@3.0.4 deduped
    │ │ ├─┬ google-p12-pem@3.0.2
    │ │ │ └── node-forge@0.9.1
    │ │ ├── jws@4.0.0 deduped
    │ │ └── mime@2.4.6
    │ ├─┬ jws@4.0.0
    │ │ ├─┬ jwa@2.0.0
    │ │ │ ├── buffer-equal-constant-time@1.0.1
    │ │ │ ├── ecdsa-sig-formatter@1.0.11 deduped
    │ │ │ └── safe-buffer@5.1.2 deduped
    │ │ └── safe-buffer@5.1.2 deduped
    │ └─┬ lru-cache@6.0.0
    │   └── yallist@4.0.0
    ├── is-stream-ended@0.1.4
    ├── lodash.at@4.6.0
    ├── lodash.has@4.5.2
    ├── node-fetch@2.6.0
    ├─┬ protobufjs@6.10.1
    │ ├── @protobufjs/aspromise@1.1.2
    │ ├── @protobufjs/base64@1.1.2
    │ ├── @protobufjs/codegen@2.0.4
    │ ├── @protobufjs/eventemitter@1.1.0
    │ ├─┬ @protobufjs/fetch@1.1.0
    │ │ ├── @protobufjs/aspromise@1.1.2 deduped
    │ │ └── @protobufjs/inquire@1.1.0 deduped
    │ ├── @protobufjs/float@1.0.2
    │ ├── @protobufjs/inquire@1.1.0
    │ ├── @protobufjs/path@1.1.2
    │ ├── @protobufjs/pool@1.1.0
    │ ├── @protobufjs/utf8@1.1.0
    │ ├── @types/long@4.0.1 deduped
    │ ├── @types/node@13.13.14
    │ └── long@4.0.0
    ├─┬ retry-request@4.1.1
    │ ├─┬ debug@4.1.1
    │ │ └── ms@2.1.2
    │ └─┬ through2@3.0.2
    │   ├── inherits@2.0.4 deduped
    │   └── readable-stream@2.3.7 deduped
    ├── semver@6.3.0
    └── walkdir@0.4.1

And my Node.js runtime v14.5.0.


@isaiah-coleman @gbhojak @xesunny could you provide me a gist, with this same information:

  1. the code that is giving you DEADLINE_EXCEEDED (with secrets redacted).
  2. the runtime that you're running in.
  3. your npm ls output (if you're running locally).
isaiah-coleman commented 4 years ago

Thanks @bcoe I understand your confusion. The error behavior seems to be non-deterministic. You would have to run the snippet every day for weeks to have the error manifest. We run the gist below daily in sets a couple times a day.

  1. the code that is giving you DEADLINE_EXCEEDED : gist
  2. the runtime that you're running in. Node v10.21.1
  3. npm ls: npm ls
xesunny commented 4 years ago

Thanks @bcoe , can you change the location of your queue from us-central1 to asia-northeast1 & then compare the result?

bcoe commented 4 years ago

@isaiah-coleman, @xesunny, thank you for continuing to help me dig into this 👍

@xesunny I will try @asia-northeast, is your Cloud Function and/or server also located near this data center?

@isaiah-coleman what location are you in?

isaiah-coleman commented 4 years ago

@bcoe we're using us-east1

bcoe commented 4 years ago

@isaiah-coleman thank you for the reproduction 👍 (and I'm sorry it took me 8 days to respond).

One more point of clarification, in what environment are you running the script? (it sounds like you're not in a GCP environment, as in @xesunny's original report which was on Mac OS).

Once I've confirmed this, I think I have a plan for some folks to bring this issue up with internally.

isaiah-coleman commented 4 years ago

Thanks for the response @bcoe, we're actually using Cloud Run.

smaeda-ks commented 4 years ago

@bcoe I also got this same error today and found this topic interesting.

I'm running my Node.js app on a single GAE (flexible) instance and creating tasks from there. The code is really simple, and I'm not creating tasks concurrently but rather creating sequentially within a while() { .. }.

    while (some_conditions) {
        [ ... ]
        const [response] = await cloudtask.createTask({parent: parent, task: task});
        const name = response.name;
        console.log(`Created task ${name}`);
    }
Screen Shot 2020-07-31 at 8 08 31

I don't think I can easily reproduce this at all as this is the first time I see this error. I've never seen this before in the last few weeks (since I started using Tasks).

GAE location: us-central1 GAE runtime: nodejs12 @google-cloud/tasks version: 2.0.1

smaeda-ks commented 4 years ago

Also, in my case above, it seems there was roughly about ~10 seconds stuck time before it raises DEADLINE_EXCEEDED error. I could confirm this from the GAE logging timestamp:

2020-07-31 07:30:33.000 JST Queue id: 5690167314612224 created successfully. 2020-07-31 07:30:43.000 JST (node:25) UnhandledPromiseRejectionWarning: Error: 4 DEADLINE_EXCEEDED: Deadline exceeded at...

Note, the log message on 2020-07-31 07:30:33.000 JST indicates we successfully created a new Datastore entity. Since we use Express we immediately return next() after this consolg.log() line, and move to that function where we create tasks. So it is accurate to say ~10 seconds, I think (no tasks were created at that time == likely hit the error in the first iteration).

Not sure closing the client (i.e., client.close(), src) would make any difference but thought it's worth trying.

@bcoe @alexander-fenster

bcoe commented 4 years ago

@smaeda-ks, @isaiah-coleman, @xesunny, (and other folks in this thread bumping into this issue).

We've opened an internal tracking thread, if would be helpful if people could send me two pieces of information:

  1. your GCP project id.
  2. a timestamp of from when the DEADLINE_EXCEEDED error was observed.

If you don't feel comfortable sharing in this thread, you can email bencoe @ google.com.