googleapis / nodejs-pubsub

Node.js client for Google Cloud Pub/Sub: Ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.
https://cloud.google.com/pubsub/
Apache License 2.0
518 stars 230 forks source link

Error: 10 ABORTED: The request raced with another user request. Please try again. #1957

Closed julius-welink closed 1 week ago

julius-welink commented 1 month ago
Error: 10 ABORTED: The request raced with another user request. Please try again.
    at callErrorFromStatus (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/call.ts:82:17)
    at Object.onReceiveStatus (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/client.ts:360:55)
    at Object.onReceiveStatus (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/client-interceptors.ts:458:34)
    at Object.onReceiveStatus (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/client-interceptors.ts:419:48)
    at /opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/resolving-call.ts:163:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/client.ts:325:42)
    at ServiceClientImpl.<anonymous> (/opt/nms/aaasvc/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
    at /opt/nms/aaasvc/node_modules/@google-cloud/pubsub/src/v1/subscriber_client.ts:338:25
    at /opt/nms/aaasvc/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at repeat (/opt/nms/aaasvc/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
    at /opt/nms/aaasvc/node_modules/google-gax/build/src/normalCalls/retries.js:119:13
    at OngoingCall.call (/opt/nms/aaasvc/node_modules/google-gax/build/src/call.js:67:27)
    at NormalApiCaller.call (/opt/nms/aaasvc/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
    at /opt/nms/aaasvc/node_modules/google-gax/build/src/createApiCall.js:112:30

We run our services in GKE and use PubSub as a message broker. This comes up occasionally and crashes the application - we opted to process.exit() on unhandled rejections.

1) Is this a client library issue or a product issue? Error originates from "@grpc/grpc-js" dependency. However, it might be triggered by a server issue.

2) Did someone already solve this? Searching "The request raced with another user request." yields no results.

3) Do you have a support contract? No contract.

Environment details

Steps to reproduce

Impossible to reproduce consistently because it's triggered randomly by the server. The error originates from a 3rd party code, with no way for me to handle it. It looks like it's coming from an asynchronous call, because the stacktrace does not contain my code.

Library user should be able to handle the error, so that it does not bubble up as an unhandled rejection. Or in this case I think the expectation is for the pubsub library to retry grpc operation?

julius-welink commented 1 month ago

These errors started after upgrading "@google-cloud/pubsub" from "4.3.2" to "4.5.0". We have logged 300 such errors during the last month.

I will revert to "4.3.2", and update the ticket in several days to let you know if it's related to 4.5.0

feywind commented 1 month ago

@julius-welink Thanks for the issue. I'm going to guess this is something on the server side, but please do let us know if it still happens after downgrading to 4.3.2.

eugene-taran commented 1 month ago

We have the same issue when we are starting our services, and doing setMetadata call in parallel for several subscriptions. What is interesting, for now only dev is affected, staging/production is fine. Sometimes the call fails when we do 6 parallel updates to different subscriptions, but in 99% it's fine, even when you are awaiting 30 updates, that run in parallel. Lib version: "@google-cloud/pubsub": "4.3.3"


Error: 10 ABORTED: The request raced with another user request. Please try again.
    at callErrorFromStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
    at ServiceClientImpl.<anonymous> (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /usr/src/app/node_modules/@google-cloud/pubsub/build/src/v1/subscriber_client.js:227:29
    at /usr/src/app/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at repeat (/usr/src/app/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
    at /usr/src/app/node_modules/google-gax/build/src/normalCalls/retries.js:118:13
    at OngoingCall.call (/usr/src/app/node_modules/google-gax/build/src/call.js:67:27)
    at NormalApiCaller.call (/usr/src/app/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
    at /usr/src/app/node_modules/google-gax/build/src/createApiCall.js:84:30```
julius-welink commented 1 month ago

@julius-welink Thanks for the issue. I'm going to guess this is something on the server side, but please do let us know if it still happens after downgrading to 4.3.2.

I have found that we are still encountering these errors even after downgrading to 4.3.2. The appearance of errors after the upgrade to 4.5.0 was merely a coincidence.

This seems to be a server error. However, I propose not closing this ticket yet, as I am still investigating whether the nodejs-pubsub library provides a way to gracefully handle the error. Or should the library handle this error automatically? Clearly it says "Please try again". :)

eugene-taran commented 3 weeks ago

From our side, looks like we found the rootcause, why it is happening on dev. In our case, together with our "initPubsub" job we have simultaneous stop-start of the service that is using the subscription. So when start subscription call is done together with setMetadata call, then it's failing.

julius-welink commented 1 week ago

I found that I can safely ignore the error by wrapping setMetadata with try catch:

try {
    await subscription.setMetadata(metadata, gaxOpts);
} catch (error: unknown) {
    console.warn(err);
}

case closed

feywind commented 1 week ago

There's a question here on whether it should have that behaviour (fail and let you try again) but it's a small enough corner case that would usually happen if e.g. a bunch of copies of a client was starting up at once. There's a quota for the number of admin function requests (which setMetadata would probably count as) in a particular time window.