w3c / IndexedDB

Indexed Database API
https://w3c.github.io/IndexedDB/
Other
237 stars 62 forks source link

Streaming data to/from IndexedDB #419

Open dumbmatter opened 3 months ago

dumbmatter commented 3 months ago

Now that the Streams API is widely supported, would it make sense to have some built-in IndexedDB API for streaming data to/from IndexedDB?

The problem now is that it is somewhat difficult and inefficient to write such functionality on your own. For example, if you want to create a ReadableStream that outputs all of the data in a giant object store, you can't just naively iterate over a cursor in ReadableStream.pull because the transaction will automatically close at some point. So you wind up kind of fighting against the stream trying to only read part of the data into memory at once, and IndexedDB closing a transaction when it's no longer active. Something like this:

const makeReadableStream = (db, store) => {
  let prevKey;

  return new ReadableStream({
    async pull(controller) {
      const range = prevKey !== undefined
        ? IDBKeyRange.lowerBound(prevKey, true)
        : undefined;

      const MIN_BATCH_SIZE = 100;
      let batchCount = 0;

      let cursor = await db.transaction(store).store.openCursor(range);
      while (cursor) {
        controller.enqueue(`${JSON.stringify(cursor.value)}\n`);
        prevKey = cursor.key
        batchCount += 1;

        if (controller.desiredSize > 0 || batchCount < MIN_BATCH_SIZE) {
          cursor = await cursor.continue();
        } else {
          break;
        }
      }

      console.log(`Done batch of ${batchCount} object`);

      if (!cursor) {
        // Actually done with this store, not just paused
        console.log("Completely done");
        controller.close();
      }
    },
  }, {
    highWaterMark: 100,
  });
};

In addition to that code being a little complicated to write, it's also probably slower than it needs to be due to creating many transactions over the course of a large stream.

I wrote a blog post about this a few years ago and if I search I still can't find anyone else talking about doing stuff like this, but I do get a couple people finding that article in Google every day and every now and again someone emails me about it, so I'm not literally the only person interested in this. Although I admit it's probably a niche use case. I do have hundreds of users every day exporting large amounts of data from IndexedDB in my video games, and that uses code similar to what I wrote in that blog post.

What would be better is maybe an API equivalent to getAll - a method on IDBObjectStore and IDBIndex that takes an IDBKeyRange and returns a stream of all matching records. And then maybe also an equivalent API for writing data to an object store.

asutherland commented 3 months ago

xref https://github.com/w3c/IndexedDB/issues/34 on explicit transaction lifetime control.