googleapis / nodejs-firestore

Node.js client for Google Cloud Firestore: a NoSQL document database built for automatic scaling, high performance, and ease of application development.
https://cloud.google.com/firestore/
Apache License 2.0
643 stars 149 forks source link

Cloud Function updating Firestore document throws CANCELLED: Call cancelled #2167

Open carlbleick opened 2 months ago

carlbleick commented 2 months ago

Environment

Problem

We are experiencing a problem where one of our Cloud Functions is trying to update a Firestore document (simplified)

import { getFirestore } from 'firebase-admin/firestore';

const firestore = getFirestore();
await firestore.collection("accounts").doc(id).update(item);

This pattern is used for many updates and our app has ~80.000 writes daily.

In very rare occasions the update fails due to the following error:

Error: 1 CANCELLED: Call cancelled
  at callErrorFromStatus (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
  at Object.onReceiveStatus (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/client.js:193:76)
  at Object.onReceiveStatus (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
  at Object.onReceiveStatus (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
  at /layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
  at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
  at ServiceClientImpl.makeUnaryRequest (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/client.js:161:32)
  at ServiceClientImpl.<anonymous> (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
  at /layers/google.nodejs.yarn/yarn_modules/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:242:29
  at /layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
  at repeat (/layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/normalCalls/retries.js:80:25)
  at /layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/normalCalls/retries.js:119:13
  at OngoingCallPromise.call (/layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/call.js:67:27)
  at NormalApiCaller.call (/layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
  at /layers/google.nodejs.yarn/yarn_modules/node_modules/google-gax/build/src/createApiCall.js:112:30
  at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: Error
  at WriteBatch.commit (/layers/google.nodejs.yarn/yarn_modules/node_modules/@google-cloud/firestore/build/src/write-batch.js:433:23)
  at DocumentReference.update (/layers/google.nodejs.yarn/yarn_modules/node_modules/@google-cloud/firestore/build/src/reference.js:445:14)
  at AccountRepository.update (/layers/google.nodejs.yarn/yarn_modules/node_modules/data-repository/lib/repositories/base/BaseRepository.js:42:43)
  at AccountUpdates.onUserSettingsUpdate (/workspace/lib/account_updates/index.js:88:27)
  at /workspace/lib/index.js:68:25
  at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 
{
  code: 1,
  details: 'Call cancelled',
  metadata: [Metadata],
  note: 'Exception occurred in retry method that was not classified as transient'
}

The same error appeared only 3 times since 2024-07-16 and never before.

I have not been able to figure out why this is happening. My assumption was that @grpc/grpc-js fails because the Firestore denies the write? But according to the logs the same document received no other updates and the update size is minimal. A rate limit or size limit wouldn't make sense to me.

ilya-allclear commented 2 months ago

We are seeing the exact same error. Interested in solution.

ehsannas commented 2 months ago

Thanks for reporting @carlbleick.

grpc attempts to retry requests that fail if they are considered retryable (several error codes are considered retryable). It then ultimately gives up after a certain number of retries. The above error hints that a retryable error had occurred and after retrying a certain number of times, grpc gave up on it.

If I understand correctly the issue started occurring without any changes to your firebase-admin/Firestore/Functions/grpc SDK dependency versions. So I suspect that the doc update kept failing either for a legitimate reason (limits?) or because the Functions environment or Firestore backend experienced an issue. I'd be interested to know whether the issue persists.

carlbleick commented 2 months ago

Just received this error again and according to my logs the document did not get any other update during that time (the next update for the document occurred 1 minute later and worked just fine).

How can I validate whether we are hitting a Firestore limit? Based on document size, update size and update rate I can assure that no limit is hit.

Can I provide more information that would help you investigate?

ehsannas commented 2 months ago

Thanks for the update. It doesn't sound like an issue with limits based on your comment. The only thing that could help would be a way to reproduce this or any other relevant logs other than the stack trace above.

As I mentioned earlier, since no SDK version changes lead to the issue it's likely a backend issue for which I can't provide much further assistance here. You can reach out to Cloud support and provide your project ID to further investigate.

ilya-allclear commented 2 months ago

@carlbleick if you do get resolution from support can you please post it here

carlbleick commented 2 months ago

@ilya-allclear I will keep you updated. Are you using the same SDK versions?

firebase-admin: 12.1.1 @google-cloud/firestore: 7.9.0 firebase-functions: 5.0.1