BulkWriter.close() hangs indefinitely when writing documents

ghost commented 1 year ago

Environment details

OS: linux (docker node:18-alpine and node:16-alpine images), running in Cloud Run or Cloud Run Jobs (4 cpu, 4Gi memory)
Node.js version: 16, 18
npm version: 8, 9
@google-cloud/firestore version: 5.0.2, 6.4.3

Steps to reproduce

Have a very large array of objects to save (Mine is ~90K long)

This is an example of my code:

for (let currentSlice = 0; currentSlice < 2000; currentSlice++) {
  const bulkWriter = this.dbContent.firestore.bulkWriter();
  const startIndex = currentSlice * 2000;
  const docsSlice = largeArray.slice(startIndex, startIndex + 2000);
  let savedSliceDocsCount = 0;

  for (const doc of docsSlice) {
    bulkWriter
      .set(this.dbContent.doc(doc.id), doc as unknown as VodDataDto)
      .then(() => savedSliceDocsCount++)
      .catch((e: BulkWriterError) => {
        let docDbErrors = dbErrors[doc.id];
        if (!docDbErrors) docDbErrors = [];
        docDbErrors.push(e);
        dbErrors[doc.id] = docDbErrors;
      });
  }

  await bulkWriter.close();

  savedDocsCount += savedSliceDocsCount;
}

The code above can fail at any point, and the process hangs until I kill it. There are no errors being reported. I have also tried using only one bulkWriter for all operations, slicing it in chunks of 2000 (as above) and 5000, and throttling with a max of 400 ops/s with no success.

Any help would be appreciated!

tom-andersen commented 1 year ago

Thanks for reporting @cfierro-glb. I'll take a look!

ghost commented 1 year ago

Hi @tom-andersen were you able to find a cause for this? We're still experiencing this issue and it's impacting our processes

maganap commented 1 year ago

@cfierro-glb I usually write very large sets (~70K) of small documents (< ~1KB). What I do differently from you:

single bulkWriter instance
no need to slice your array, just flush() every few docs.

I do something like this, being _bulkOps a huge array.

  const bulkWriter = db.bulkWriter();
  const bulkSize = 500;
  let i = 0;
  for (const op of _bulkOps) {
    bulkWriter.update(op.ref, op.data);
    if (++i % bulkSize == 0) {
      await bulkWriter.flush();
      console.log(`Bulk flush: ${i}`);
    }
  }
  await bulkWriter.close(); // flush and close
  console.log(`Bulk total writes: ${i}`);

You can also try a much smaller bulkSize if your individual documents are too large.

On the other hand, updating savedDocsCount with += savedSliceDocsCount after await bulkWriter.close(): you can't be sure all bulkWriter.set() promises would have triggered already after await bulkWriter.close(). It only ensures there are no more pending writes, but it mentions nothing about the operation WriteResult promises.

I would suggest you add the promise to an array of promises and wait for them later (or on every flush, to save some resources). Something like:

  const bulkWriter = db.bulkWriter();
  const bulkSize = 500;
  const proms = [];  // <-- here
  let i = 0;
  for (const op of _bulkOps) {
    proms.push( bulkWriter.update(op.ref, op.data) );  // <-- here
    if (++i % bulkSize == 0) {
      await bulkWriter.flush();
      console.log(`Bulk flush: ${i}`);
    }
  }
  await bulkWriter.close(); // flush and close
  console.log(`Bulk total writes: ${i}`);
  const results = await Promise.allSettled(proms); // <-- here: yes, safe to pass a 70K length array
  results.forEach(result => console.log(result.status)); // <-- here: whatever you need when status: rejected

Regarding the 500 "safe limit": I use Firebase Admin SDK lib. I've been reading the documentation and certainly it doesn't mention anything about a limit in the bulkWriter operations but, batch and transactions do have a limit of 500 operations. Somehow I ended up using it with bulkWriter as well, although not documented, it seems to help (at least to release some resources and let the rest flow smoothly).

Hope it helps.

ghost commented 1 year ago

@maganap thanks for your help. I also tried doing a single instance with several flush() calls and also experienced the same issue. At first, I reverted back to a previous version to solve the issue, but in the end, we migrated our DBs to Mongodb

tom-andersen commented 1 year ago

I haven't been able to reproduce this problem. If someone has a repro they can share, I will happily investigate further.

charliejlevine commented 1 year ago

I believe I'm getting a similar issue in a Google Cloud Function PubSub. What could be going on in bulk-writer.js based on this error output?

Error: 13 INTERNAL: Received RST_STREAM with code 1 Error: 1 CANCELLED: Call cancelled (I received the above two errors on different reruns of the function. The error trace below was the same.) at BulkWriterOperation.onError (/workspace/node_modules/@google-cloud/firestore/build/src/bulk-writer.js:96:37) at BulkCommitBatch.bulkCommit (/workspace/node_modules/@google-cloud/firestore/build/src/bulk-writer.js:193:36) at runMicrotasks () at runNextTicks (internal/process/task_queues.js:60:5) at processImmediate (internal/timers.js:437:9) at process.topLevelDomainCallback (domain.js:152:15) at process.callbackTrampoline (internal/async_hooks.js:128:24) at async BulkWriter._sendBatch (/workspace/node_modules/@google-cloud/firestore/build/src/bulk-writer.js:754:13)

charliejlevine commented 1 year ago

Update: Adding try catch block around BulkWriter.close still results in crash later in the function. It seems that BulkWriter.close is not failing gracefully. I am rewriting the code to have update promises save to an array and have the BulkWriter do its job at the end of the function, flushing it every 500 updates as suggested previously. Any additional information on this would be greatly appreciated. @maganap @tom-andersen Thank you.

maganap commented 12 months ago

@charliejlevine

Error: 13 INTERNAL: Received RST_STREAM with code 1 Error: 1 CANCELLED: Call cancelled

These both seem like the other end had a problem, not your end. Kindly notice this is not the same problem originally reported in this thread (indefinitely hanging on flush/close), so you may want to open a new thread with this info.

In any case, is this problem you're having easily reproducible? Or does it just happen every now and then?

Just to give some ideas for tests to try to find the problem:

You may be overloading Firestore. If that was the case:

Just for testing purposes try to slow down the writes (make the bulk flush every 100 operations, and wait a couple of seconds before the next flush). Does this solve the problem?
Make sure your bulk writer is not creating/updating documents that cause too many indexations. Read about contiguous/incremental fields that are indexed, those may cause problems. You may also want to create exceptions to those fields that don't need to be indexed (since Firestore automatically creates indexes for all fields).

If that does not improve anything, then overloading may not be your issue.

github-actions[bot] commented 2 weeks ago

This has been closed since a request for information has not been answered for 15 days. It can be reopened when the requested information is provided.

googleapis / nodejs-firestore

BulkWriter.close() hangs indefinitely when writing documents #1827

Environment details

Steps to reproduce