Closed kshetline closed 1 year ago
If I use allowUnknownFormat: true
I get Error: Unexpected end of data
, but at least all of the data I care about has been extracted before the error occurs, so at least I've got a workaround for now.
extract.on('error', err => {
if (/unexpected end of data/i.test(err.message) && Object.keys(result.sources).length >= 11)
resolve(result);
else
reject(makeError(err));
});
I ran into the same issue with: https://data.iana.org/time-zones/releases/tzdata1999f.tar.gz
...except this time the error I get is Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?
. I can still extract the data that I need to extract before the error occurs.
These archives seem to have malformed headers. If you run the command gzip -c -d tzdata1997b.tar.gz > tzdata1997b.tar
, you get to the tar file underneath, which you can then investigate with a hex editor.
Here's what these archives look like:
00000000: 6166 7269 6361 0000 0000 0000 0000 0000 africa..........
<binary nonsense>
00000100: 0000 0000 0000 0000 0000 0000 0000 0000 ................
And here's what a good tar archive should look like:
00000000: 6e6f 6465 5f6d 6f64 756c 6573 2f00 0000 node_modules/...
<binary nonsense>
00000100: 0075 7374 6172 0030 3069 6c6c 7269 6768 .ustar.00illrigh
The difference is in the last line, namely how it contains .ustar.00
. This is called a "magic", which is what programs use to recognize tar archives. It's missing from IANA archives (for whatever reason), which is why the allowUnknownFormat: true
is needed to bypass the header check.
The other issue, the one where you get Error: Unexpected end of data
is because the archive contains the header for the file yearistype.sh
twice, and the first time it actually has the script text but the next one it just abruptly ends. All in all, these archives are terribly malformed, and the reason why tar and the like can consume it with no problems is probably because they aggressively silence errors and try to make the best of the file. Your hacky solution is probably the best one for these kinds of files.
send a test if still relevant
Sorry I didn't respond sooner. It doesn't surprise me that the problem files are the real problem. It would be useful, if you care to add such an option, to allow more lenient decoding to get around issues like these.
I'm creating a tool to automatically extract and decode timezone data. It's working well except for a couple of odd cases where my MacOS command line and desktop tools can untar these files without a hitch, but tar-stream can't decode them for some reason.
Here are the problem cases:
https://data.iana.org/time-zones/releases/tzdata1997b.tar.gz https://data.iana.org/time-zones/releases/tzdata1997c.tar.gz
The gunzip step of the process works fine, it's just the untarring that fails.
I'm not sure what's so special about these particular archives that makes tar-stream choke.