mafintosh / tar-stream

tar-stream is a streaming tar parser and generator.
MIT License
406 stars 92 forks source link

Fails to pack an archive with non-Latin characters in file names #90

Closed shelart closed 5 years ago

shelart commented 5 years ago

I've written simple code, it just copies a .tar to another one logging files names:

const tar = require('tar-stream');
const pack = tar.pack();
const extract = tar.extract();
const path = require('path');
const fs = require('fs');

extract.on('entry', (header, stream, next) => {
    console.log(header.name);
    stream.pipe(pack.entry(header, next));
});

extract.on('finish', () => {
    // all entries done - lets finalize it
    pack.finalize();
});

const tarPath = './example.tar';
const tarPathParsed = path.parse(tarPath);
const outputPath = `${tarPathParsed.dir}/${tarPathParsed.name}.new${tarPathParsed.ext}`;

let oldTarballStream = fs.createReadStream(tarPath);
let newTarballStream = fs.createWriteStream(outputPath);

// pipe the old tarball to the extractor
oldTarballStream.pipe(extract);

newTarballStream.on('close', () => {
    console.log(`${outputPath} has been written`);
});

// pipe the new tarball the another stream
pack.pipe(newTarballStream);

Also I've created an example.tar with a single file named Тестовый файл.txt (Cyrillic characters in the file name). When I ran my code above, I've got example.new.tar with 2 files, both are named Pax Header. One of them contains:

38 path=Тестовый файл.txt

Another Pax Header contains the full content of Тестовый файл.txt.

Moreover, once I re-ran the code applying it to the example.new.tar (with those 2 PaxHeader's) I've got a tarball with, also, 2 PaxHeader's, but one of them was:

38 path=Тестовый файл.txt
38 path=Тестовый файл.txt

Another, again, was exact my original Тестовый файл.txt.

I believe it's a bug of pack().

alenhu commented 5 years ago

use iconv

mafintosh commented 5 years ago

@shelart do you mind turning this into a test case and PR'ing it?

shelart commented 5 years ago

@mafintosh I will try. I'll comment here about results

shelart commented 5 years ago

@mafintosh I'm sorry for the inconvience. I trusted to 7-Zip when opened the issue. It shows a created tarball as containing two PaxHeader files. Detailed investigation shown that this is a problem of 7-Zip which seems not supporting PaxHeader TAR extension. I re-checked the created tarball with WinRAR and with Linux tar, they work as expected. So I'm closing the issue.