googleapis / nodejs-pubsub

Node.js client for Google Cloud Pub/Sub: Ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.
https://cloud.google.com/pubsub/
Apache License 2.0
516 stars 227 forks source link

acknowledgments randomly fail with EXACTLY_ONCE_ACKID_FAILURE #1951

Open matejdro opened 4 days ago

matejdro commented 4 days ago

Environment details

Steps to reproduce

  1. Pull a message via subscriberClient.pull({ 1, returnImmediately: false, subscription: mySubscriptionId, })
  2. Ack this pulled message via subscriberClient.acknowledge({ ackIds: [message.ackId!], subscription: mySubscriptionId }) (at most several seconds after receiving, so message could not have expired yet)

Most of the time above works fine, but rarely, acknowledgement will fail with the following error:

INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed. Error: 3 INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.
     at callErrorFromStatus (/opt/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
     at Object.onReceiveStatus (/opt/node_modules/@grpc/grpc-js/build/src/client.js:193:76)
     at Object.onReceiveStatus (/opt/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
     at Object.onReceiveStatus (/opt/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
     at /opt/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
     at processTicksAndRejections (node:internal/process/task_queues:78:11)
 for call at
     at ServiceClientImpl.makeUnaryRequest (/opt/node_modules/@grpc/grpc-js/build/src/client.js:161:32)
     at ServiceClientImpl.<anonymous> (/opt/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
     at /opt/node_modules/@inovait/pubsub-pull-subscriber/node_modules/@google-cloud/pubsub/build/src/v1/subscriber_client.js:227:29
     at /opt/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
     at repeat (/opt/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
     at /opt/node_modules/google-gax/build/src/normalCalls/retries.js:119:13
     at OngoingCallPromise.call (/opt/node_modules/google-gax/build/src/call.js:67:27)
     at NormalApiCaller.call (/opt/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
     at /opt/node_modules/google-gax/build/src/createApiCall.js:112:30 {
   code: 3,
   details: 'Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.',
   metadata: Metadata {
     internalRepr: Map(2) {
       'google.rpc.errorinfo-bin' => [Array],
       'grpc-status-details-bin' => [Array]
     },
     options: {}
   },

   note: 'Exception occurred in retry method that was not classified as transient',
   statusDetails: [
     ErrorInfo {
       metadata: [Object],
       reason: 'EXACTLY_ONCE_ACKID_FAILURE',
       domain: 'pubsub.googleapis.com'
     }
   ],
   reason: 'EXACTLY_ONCE_ACKID_FAILURE',
   domain: 'pubsub.googleapis.com',
   errorInfoMetadata: {
     'BhYsXUZIUTcZCGhRDk9eIz81IChFEQMIFAV8fXBBUXVdWndoUQ0Zcn1gfGpdR1cDQlB8VlEbB2hObXUf1vWz9kthRUh7EgAEQFt4VlkZCWhdWnYOWAgx1P2MuMGC6z4Yfmv2p_GfO0ii4Pl8Zi09XxJLLD5-Ky5FQV5AEkwgG0RJUytDCypYEU4EISE-MD5FU0RQ': '
PERMANENT_FAILURE_INVALID_ACK_ID'
   }
 }

Relevant error seems to be EXACTLY_ONCE_ACKID_FAILURE, but we cannot find any documentation for this error anywhere.

We are not sure what causes this. After this happens the received message will remain "un-ackable". The only workaround we found so far is to let the message expire and get redelivered. Redelivered message can in most cases be acked normally.

Any ideas what could cause this?

feywind commented 4 days ago

I wonder if this has something to do with ack IDs being ack'd more than once. It's a server error being returned, though, so someone else might know more. @maheshgattani ?

matejdro commented 19 hours ago

We added a lot of logging and as far as we can tell, we never ack the message more than once. Also, this broken message will eventually expire and be re-delivered, which signifies that it was not acked.