download of compressed files on arm-plattforms

simonhbw commented 3 years ago

hi, ive a problem using tools like @azure/storage-blob or node-fetch on armv7 devices with any node-version, which uses llhttp instead of the legacy parser. when downloading a compressed file, the response is always decompressed what cause an data corruption failure. i can reproduce the issue using whatever @azure/storage-blob or node-fetch directly.

the interessting thing is, that the issue only ocurres on arm(v7)-plattforms. running the same code on an x64/86 plattform, doesnt cause any issues. changin the node-version to 11.x or 12.x, with "http-parser=legacy" also doesnt cause any issues

code sample:

const { BlockBlobClient } = require('@azure/storage-blob');
const path = require('path')
const fs = require('fs')
const cliProgress = require('cli-progress');

const repositoryPath = '../repository'
const uri = 'placeUrlWithSasHere'

async function startDownload(sasUri) {
try {
    console.log('starting download from url: %s', sasUri);
    const blobClient = new BlockBlobClient(sasUri);
    const props = await blobClient.getProperties();
    // console.log({ props });

    const fileName = blobClient._name;
    const filePath = path.join(repositoryPath, fileName);
    const bar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);

    const downloadBlockBlobResponse = await blobClient.download(null, null,
        {
            onProgress: ((ev) => {
                bar.update(ev.loadedBytes);
            }),
        });

    const readStream = downloadBlockBlobResponse.readableStreamBody;
    const writeStream = fs.createWriteStream(filePath);

    bar.start(props.contentLength, 0);
    readStream
        // .pipe(throttler)
        .pipe(writeStream);

    readStream.on('error', (e) => { 
        bar.stop()
        console.error(e.message)
    });
    writeStream.on('error', (e) => { 
        bar.stop()
        console.error(e.message)
    });

    readStream.on('end', () => {
        bar.stop()
        console.log('readstream finished')
    })
} catch (error) {
   console.error(error)
}
}

startDownload(uri)

result on armv7:

Data corruption failure: received less data than required and reached maxRetires limitation. Received data offset: 1473222655, data needed offset: 10063157247, retries: 5, max retries: 5

result on x64

readstream finished

Chaphasilor commented 3 years ago

I'm also having this issue, check out the original issue (https://github.com/nodejs/node/issues/37053) I created, which includes a basic setup for reproducing the issue :)

indutny commented 3 years ago

Thanks to undici issue: https://github.com/nodejs/undici/issues/803 . I finally know what's going on there. There is a truncating conversion of uint64_t to size_t in consume in llparse. Will work on the fix soon.

indutny commented 3 years ago

Fix: https://github.com/nodejs/llparse/pull/44

indutny commented 3 years ago

PR for node.js: https://github.com/nodejs/node/pull/38665

simonhbw commented 3 years ago

for my specific problem, the issue is not fixed. im getting the following results when running the code from the first post, using a large file (~10GB):

using node@12.22.1

readstream finished

using node@14.x

Data corruption failure: received less data than required and reached maxRetires limitation. Received data offset: 1473222655, data needed offset: 10063157247, retries: 5, max retries: 5

using node@16.3.0

the download progress stops after at a random part between 5GB or 8GB without any log message note that the "Data curruption failure" does not ocurr anymore.

Chaphasilor commented 3 years ago

using node@16.3.0

the download progress stops after at a random part between 5GB or 8GB without any log message note that the "Data curruption failure" does not ocurr anymore.

Can you provide some demo code or the link it happens with? Would love to try it out, given that it worked for me with a 12+ GB file on Node 16.3.0...
Also, Node 16 didn't give an error message ever, it just closed the connection :)

simonhbw commented 3 years ago

using node@16.3.0

the download progress stops after at a random part between 5GB or 8GB without any log message note that the "Data curruption failure" does not ocurr anymore.

Can you provide some demo code or the link it happens with? Would love to try it out, given that it worked for me with a 12+ GB file on Node 16.3.0... Also, Node 16 didn't give an error message ever, it just closed the connection :)

you can use the code from my first post. because the file is non-public, i could send an email with the url. is this ok for you?

Chaphasilor commented 3 years ago

you can use the code from my first post. because the file is non-public, i could send an email with the url. is this ok for you?

Maybe you could try reducing your code further. Is there a way to use that azure library to get a (temorary) download URL for that file?

If yes, you could try download the file from that URL, using the built-in http(s) module.
This would make sure that the issue isn't in the azure package :)

That is, unless you already tried that \^\^

nodejs / llhttp

download of compressed files on arm-plattforms #110