Incomplete download when writing a file in chunks

myomv100 commented 11 months ago

Hello Everyone,

I have been playing with StreamSaver for a few days and stuck at a weird problem. What I want to accomplish is to first fetch the file and then decrypt it in chunks of 256kb. The code below works ok for files less 5MB to 6MB. Most of the time, when the file being downloaded was larger, the writing stream gets closed by itself with unwritten chunks. Most of the time I get between 70-90% of the file only. I could not figure out the issue since it is not deterministic.

What could be the reason I get all the file downloaded if it was large?

The code:

handleDownload: async function(url, callback) {
    const data = await fetch(url).then((res) => res.blob())
    var totalSize = data.size;
    var chunkSize = 262144;
    const fileStream = streamSaver.createWriteStream(filename, {
        size: totalSize 
    });
    const writer = fileStream.getWriter()
    for (let i = 0; i < totalSize; i+=chunkSize) {
        var from = i;
        var end = i+chunkSize;
        if (end > totalSize) {
       end = totalSize;
        }
        const blob = await data.slice(from, end)
        const result = await FileDecrypt(blob);
        const readableStream =  new Blob([result], { type: result.type}).stream()
        await readableStream.getReader().read().then(async ({ value, done }) => await writer.write(value))
    }
    await writer.close()
}

myomv100 commented 11 months ago

Problem solved by reducing chunkSize from 262144 to 65536. Sometimes the readable stream can't read more than 65536 at a time. I moved to antoher soultion to handle fetched data and decrypt it on the fly.. In case someone needs it :

try {
  async function* streamAsyncIterable(stream) {
    const reader = stream.getReader()
    try {
      while (true) {
        const {
          done,
          value
        } = await reader.read()
        if (done) return
        yield value
      }
    } finally {
      reader.releaseLock()
    }
  }

  const response = await fetch(url)
  var responseSize = 0;
  var chunkMaxSize = 65536;
  var totalWorkArray = new Uint8Array([]);
  const fileStream = streamSaver.createWriteStream(filename, {
    size: fileSize
  });
  const writer = fileStream.getWriter()
  for await (const chunk of streamAsyncIterable(response.body)) {
    let chunkSize = chunk.length
    responseSize += chunkSize
    var mergedArray = new Uint8Array(totalWorkArray.length + chunk.length);
    mergedArray.set(totalWorkArray);
    mergedArray.set(chunk, totalWorkArray.length);
    totalWorkArray = mergedArray
    while (totalWorkArray.length > chunkMaxSize) {
      const c = totalWorkArray.slice(0, chunkMaxSize);
      let work = new Blob([c], {
        type: 'application/octet-stream'
      })
      var temp = totalWorkArray.slice(chunkMaxSize)
      totalWorkArray = temp
      const plain = await FileDecrypt(work, 1, work.type);
      const readableStream = new Blob([plain], {
        type: plain.type
      }).stream()
      await readableStream.getReader().read().then(async ({
        value,
        done
      }) => {
        await writer.write(value)
      });
    }
  }
  const work = new Blob([totalWorkArray], {
    type: 'application/octet-stream'
  })
  const plain = await FileDecrypt(work, 1, work.type);
  const readableStream = new Blob([plain], {
    type: plain.type
  }).stream()
  await readableStream.getReader().read().then(async ({
    value,
    done
  }) => {
    await writer.write(value)
    await writer.close()
  });

} catch (error) {
  ...
}

jimmywarting commented 11 months ago

Ouch. This solution makes me sad to see. Will try to reply back to this in a hour or so when i have access to a computer with a solution that i think works better. In the Meanwhile. Can share the code to the decoder and a encrypted file?

jimmywarting commented 11 months ago

Okey... so here is what i came up with:

/**
 * Read a stream into same underlying ArrayBuffer of a fixed size.
 * And yield a new Uint8Array view of the same underlying buffer.
 * @param {ReadableStreamBYOBReader} reader
 * @param {number} chunkSize
 */
async function* blockReader(reader, chunkSize) {
  let offset = 0;
  let buffer = new ArrayBuffer(chunkSize)
  let done, view

  while (!done) {
    ({value: view, done} = await reader.read(new Uint8Array(buffer, offset, chunkSize - offset)))
    buffer = view.buffer
    if (done) break
    offset += view.byteLength;
    if (offset === chunkSize) {
      yield view
      offset = 0
      // if you want to reuse the same allocated buffer for efficiency,
      // comment the following line:
      // buffer = new ArrayBuffer(chunkSize)
    }
  }

  if (offset > 0) {
    yield view.buffer.slice(0, offset)
  }
}

const url = 'https://raw.githubusercontent.com/lukeed/clsx/main/src/index.js'
const filename = 'clsx.js'
const fileSize = 1000000
const chunkMaxSize = 65536

const response = await fetch(url)
const fileStream = streamSaver.createWriteStream(filename, {
  size: fileSize
})
const writer = fileStream.getWriter()
const reader = response.body.getReader({ mode: 'byte' })
const type = 'application/octet-stream'
const iterator = blockReader(reader, chunkMaxSize)

// `sameUnderlyingBuffer` is an Uint8Array of the same underlying ArrayBuffer
// It means that the Uint8Array is detached in every loop.
// and not reusable in the next loop. (so don't try to concat them all)
// However the ArrayBuffer is reused / recycled and updated in every loop.
// This is the most efficient way to read a stream.
for await (const sameUnderlyingBuffer of iterator) {
  const plain = await FileDecrypt(sameUnderlyingBuffer,  1, type)
  await writer.write(plain)
}

await writer.close()

jimmywarting commented 11 months ago

a problem you have in both examples are that you are creating a stream from new Blob(...).stream() and only read the first chunk.

await readableStream.getReader().read().then(async ({ value, done }) => await writer.write(value))

one .read() may not read everything from a blob. it that case you want to do new Blob(...).arrayBuffer() or better yet, try avoid create a blob at all in the first place. cuz there is not much need in doing that.

jimmywarting commented 11 months ago

also if i may suggest i would add this polyfill:

ReadableStream.prototype.values ??= function({ preventCancel = false } = {}) {
    const reader = this.getReader();
    return {
        async next() {
            try {
                const result = await reader.read();
                if (result.done) {
                    reader.releaseLock();
                }
                return result;
            } catch (e) {
                reader.releaseLock();
                throw e;
            }
        },
        async return(value) {
            if (!preventCancel) {
                const cancelPromise = reader.cancel(value);
                reader.releaseLock();
                await cancelPromise;
            } else {
                reader.releaseLock();
            }
            return { done: true, value };
        },
        [Symbol.asyncIterator]() {
            return this;
        }
    };
};

ReadableStream.prototype[Symbol.asyncIterator] ??= ReadableStream.prototype.values;

that way you could do:

for await (const chunk of response.body) { ... }

only servers and Firefox have async iterator.

jimmywarting / StreamSaver.js

Incomplete download when writing a file in chunks #326