mafintosh / tar-stream

tar-stream is a streaming tar parser and generator.
MIT License
406 stars 92 forks source link

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped? #68

Closed evilpacket closed 7 years ago

evilpacket commented 7 years ago

I have a tarball from the registry (and others) that consistency fail with invalid tar header errors but are valid archive.

To replicate

  1. wget https://registry.npmjs.org/eslint-config-metashop/-/eslint-config-metashop-1.5.0.tgz

  2. Use the following test case

    
    const Tar = require('tar-stream');
    const Gunzip = require('gunzip-maybe');
    const Fs = require('fs');

const gunzip = Gunzip(); const extract = Tar.extract(); const inputFile = Fs.createReadStream(process.argv[2]);

extract.on('error', function (err) { console.log(err); });

extract.on('entry', function (header, stream, callback) { stream.on('end', function () { return callback() }); stream.resume() });

inputFile.on('error', function (err) { console.log(err) });

inputFile.pipe(gunzip).pipe(extract);


3. node test.js eslint-config-metashop-1.5.0.tgz

The error I get is something like

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped? at Object.exports.decode (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/headers.js:265:40) at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:124:39) at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8) at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28) at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10) at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7) at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8) at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28) at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10) at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7)



Doing some debugging it appears that the stream has advanced too far into the archive and the data it's trying to pass to parse the header is actually file contents, that's as far as we've got so far.
nlf commented 7 years ago

i did a little more digging here and it looks like what's going on is the header for the package/rules directory is being parsed with a size of 306. according to the tar internals documentation

On systems where disk allocation is performed on a directory basis, the size field will contain the maximum number of bytes (which may be rounded to the nearest disk block allocation unit) which the directory may hold. A size field of zero indicates no such limiting. Systems which do not support limiting in this manner should ignore the size field.

as a quick test, i patched headers.js so that if (type === 'directory') size = 0 and the errors disappeared and the tarball extracted appropriately.

it sounds like for directories the size flag should probably only be informative for this module and an empty stream always returned. since we don't know what the end user is doing with the stream, we can't really enforce byte limits in an empty directory even if we wanted to.

piranna commented 7 years ago

i patched headers.js so that if (type === 'directory') size = 0 and the errors disappeared and the tarball extracted appropriately

Good catch :-) Could you do a pull-request?

it sounds like for directories the size flag should probably only be informative for this module and an empty stream always returned. since we don't know what the end user is doing with the stream, we can't really enforce byte limits in an empty directory even if we wanted to.

Absolutely, in fact I don't find any sense in having directories a size at all... seems it's mostly related to the size on disk as your quote indicate, similar to why ls command show 4KB for directories on Ext4 filesystems, that on regular desktop Linux systems it's the default inode size.

nlf commented 7 years ago

i only suggested passing it along in an informational sense for the sake of being thorough. you never know if someone might want to use this module to write files to disk and is using a filesystem that supports these limits :)

i'll send over a pull request with a proposed fix shortly