Open Djoop opened 3 years ago
Weird. I have never seen that happen before. I think is is an upstream bug in 7zip. Can you see if you can reproduce with 7zip alone?
As a work around you can add to the registration block:
post_fetch_method = compressed_filename -> run(`gunzip -l ...`)
Indeed, it seems to be an upstream bug (actually, I don't know if the bug is from 7zip or from gunzip…). Here is what I get with the 7zip packed with my distribution, the same archive yields two different file names with gunzip
and 7z
:
$ 7z l kddcup.data_10_percent.gz
7-Zip [64] 17.03 : Copyright (c) 1999-2020 Igor Pavlov : 2017-08-28
p7zip Version 17.03 (locale=fr_FR.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs x64)
Scanning the drive for archives:
1 file, 2144903 bytes (2095 KiB)
Listing archive: kddcup.data_10_percent.gz
--
Path = kddcup.data_10_percent.gz
Type = gzip
Headers Size = 43
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2007-06-08 04:35:37 ..... 74889749 2144903 kddcup.data_10_percent_corrected
------------------- ----- ------------ ------------ ------------------------
2007-06-08 04:35:37 74889749 2144903 1 files
------------------------------------------------------------------------------------------------
$ gunzip -l kddcup.data_10_percent.gz
compressed uncompressed ratio uncompressed_name
2144903 74889749 97.1% kddcup.data_10_percent
I don't know if there is anything special with this archive as I did not create it, yet this is surprising. Thanks for the workaround, I guess it works only if there is a single file in the archive?
Thanks for the workaround, I guess it works only if there is a single file in the archive?
Well you can run
what ever you want.
E.g. tar -xzf ...
will do gzipped tarballs.
I have some code using the
unpack
function which fails with DataDeps 7.7 (apparently there were some changes to use 7zip on all platforms, not sure when exactly the breaking change happened). I have a wrapper for the following dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz (not sure if the file has something to do or if this is a generic error), which contains a file called "kddcup.data_10_percent" (as can be seen e.g. usinggunzip -l …
), yetunpack
creates a file calledkddcup.data_10_percent_corrected
(with some other files it ended up in.corrected
).Unpacking runs without an error, however it is inconvenient as I was expecting it to respect the file names (and this was the behavior with previous versions of DataDeps). Or is there a special function to use in order to obtain the path of unpacked files?