jvilk / BrowserFS

BrowserFS is an in-browser filesystem that emulates the Node JS filesystem API and supports storing and retrieving files from various backends.
Other
3.07k stars 218 forks source link

Advice for tight memory constraints and buffers? #320

Closed thelamer closed 2 years ago

thelamer commented 2 years ago

I am trying to work within some memory constraints on client systems outside of my control and I have come up with some functional code:

    var Init = { method:'GET',headers:{'Access-Control-Allow-Origin':'*'},mode:'cors'};
    var response = await fetch(url,Init);
    var length = response.headers.get('Content-Length');
    let at = 0;
    var reader = response.body.getReader();
    while (true) {
      let {done, value} = await reader.read();
      if (done) {
        break;
      }
      fs.appendFileSync('PATH/TO/FILE', new Buffer(value));
      at += value.length;
      dlProgress = ((at / length).toFixed(2) * 100).toFixed(0);
      $('#progress').text(dlProgress.toString() + '%');
    }

This works, the browser aggressively garbage collects and the read stream is written to the memfs without having a double copy like you would if this was stored in a Uint8array, buffered, and written to the inmemory fs all at once. But the performance is terrible, creating new buffers for every chunk while calling the code to write for large files takes ages in comparison to storing a single array buffer and dumping it all at once. IE for a 203M file:

Just thought I would reach out and see if anyone has come up with something for writing streams in a higher performance mode while not double occupying memory.

thelamer commented 2 years ago
    var chunkSize = 10240000;
    var headerInit = { method:'HEAD',headers:{'Access-Control-Allow-Origin':'*'},mode:'cors'};
    var response = await fetch(url, headerInit);
    var length = response.headers.get('Content-Length');
    if (length > chunkSize) {
      let rangeStart = 0;
      let rangeEnd = chunkSize -1;
      let chunkCount = Math.ceil(length / chunkSize);
      let lengthEnd = length -1;
      for (let i = 0; i < chunkCount; i++) {
        console.log('Downloading part ' + (i + 1) + ' of ' + chunkCount);
        console.log('range=' + rangeStart + '-' + rangeEnd);
        let chunkInit = { method:'GET',headers:{'Access-Control-Allow-Origin':'*', 'Range': 'bytes=' + rangeStart + '-' + rangeEnd},mode:'cors'};
        let response = await fetch(url, chunkInit);
        let length = response.headers.get('Content-Length');
        let array = new Uint8Array(length);
        let at = 0;
        let reader = response.body.getReader();
        for (;;) {
          var {done, value} = await reader.read();
          if (done) {
            break;
          }
          array.set(value, at);
          at += value.length;
          dlProgress = ((at / length).toFixed(2) * 100).toFixed(0);
          console.log((i + 1) + '/' + chunkCount + ' ' + dlProgress.toString() + '%');
        }
        let fileChunk = new Buffer(array);
        array = null;
        fs.appendFileSync('/path/to/file', fileChunk);
        fileChunk = null;
        // Set chunk range for next download
        rangeStart = rangeEnd + 1;
        if ((rangeEnd + chunkSize) > lengthEnd) {
          rangeEnd = lengthEnd;
        } else {
          rangeEnd = rangeEnd + chunkSize;
        };
      };
    };

I ended up using byte-ranging to grab the file in pre-determined chunks. This allows garbage collection of the chunks as they come in and does not double occupy memory.