Proposal: Sync Writer API

Gozala commented 2 years ago

For dotStorage upload@2.0 we use CARs more or less as network packets. Specifically we want to allocate n bytes for a packet and keep writing blocks until desired size is reached. Then send that CAR out and start writing new blocks into a new CAR.

Unfortunately current writer API is not ideal for such a use case and imposes unecessary asynchrony. Here is the code I found myself writing.

const encode = async ({ blocks, roots = [CID.parse("bafkqaaa")] }) => {
    const { writer, out } = CAR.CarWriter.create(roots);
    for (const block of blocks) {
      writer.put(block);
    }
    writer.close();

    const chunks = [];
    let byteLength = 0;
    for await (const chunk of out) {
      chunks.push(chunk);
      byteLength += chunk.byteLength;
    }

    const buffer = new Uint8Array(byteLength);
    let byteOffset = 0;
    for (const chunk of chunks) {
      buffer.set(chunk, byteOffset);
      byteOffset += chunk.byteLength;
    }
    return buffer;
  };

It would be nice if we had another writer API to support such a use case, one that would not impose async reads from the output. Maybe something like:

declare function createWriter(buffer:ArrayBuffer, options?:WriterOptions): SyncBlockWriter

interface WriterOptions {
   roots?: CID[] // defaults to []
   byteOffset?: number // defaults to 0
   byteLength?: number // defaults to buffer.byteLength
   rootsCapacity?: number // defaults to total byteLength used by passed roots
}

interface SyncBlockWriter {
  /**
   * Throws an error if total root count is greater than root capacity specified at creation.
   */
  addRoots(roots:CID[]): void
  /**
   * Throws an error if buffer does not have enough capacity.
   */
  write(block: Block):void
  /**
   * Returns `Uint8Array` view into provided `ArrayBuffer` containing CAR bytes. Returned `Uint8Array` is a view
   * into a provided `ArrayBuffer` that was written.
   */
  close(): Uint8Array
}

Gozala commented 2 years ago

I closed #68 as this provides more adequate API with a lot more predictable memory profile.

Gozala commented 2 years ago

Quoting relevant comment from #68

It wouldn't really even need a byteLength argument because it's going to be holding on to a list of Uint8Arrays anyway and it just needs to create the header when it gets the roots and spit that out first. Nothing about the bytes that it buffers is impacted by the length of the header. You just get a one-shot addRoots() call to open the flood gates.

This is interesting point! In our use cases we generally know how many roots we'll have ahead of time even though we do not know what they're going to be. So it's fairly easy to tell how much memory needs to be allocated to fit the roots. And doing so would allow writing into buffer without having to hold onto blocks freeing memory sooner.

However if number of roots is unknown ahead of writes not having to allocate space is definitely going to be a better option. Given known use cases, I would still suggest going with proposed API one with unknown roots could easily be build as layer on top when/if desired.

ipld / js-car

Proposal: Sync Writer API #69