Open mpetroff opened 3 years ago
@mpetroff This looks reasonable, but I need to take a closer look. I'll probably have a few, mostly stylistic, issues.
Could you remove all the libtool wrappers from the PR (like test/sie_get_little_zip
) and instead list them in the .gitignore
file?
Could you remove all the libtool wrappers from the PR (like test/sie_get_little_zip) and instead list them in the .gitignore file?
Those were added accidentally when I converted the existing patch into the Git commit. I just removed them and added them to the .gitignore
file. I squashed and force-pushed this change to remove the files from the branch history.
@mpetroff I'm working through this, and hope to have it ready some time next week. Among other things, I've added a ./configure option to enable/disable the feature and improved error propagation.
Would you prefer I push the changes to your fork or copy your branch here and do it locally?
Also, would you have time to test the changes?
Pushing to my fork's branch is fine. It might take me a week or two to get to it, but I'll have time to test the changes. Thanks for working on getting this merged.
Just to keep you up to date: I was hoping to get this finished up before releasing v0.11.0, but I think it needs more work, and it's pointing me to some changes that need fixing within the encoding framework, so it'll have to wait for the next release, which I'm hoping won't be too long from now. (Pushing GetData-0.11.0 out the door has laid bare some things that really do need some work.)
This PR adds read-only support for reading Dirfiles that are in uncompressed Zip files. Development of the patch was motivated by a need to reduce the total file count for FLAC-encoded Dirfiles, to alleviate the backup and data transfer overheads that result from having a very large number of small files. CLASS has been using these changes for more than a year at this point. The PR is identical to the patch attached to my 2020-02-28 post to the getdata-devel mailing list, except without the documentation (since it isn't part of this Git repository). The original version of the patch dates back to 2018.
Documentation
Separate from the Dirfile encoding scheme, GetData will read Dirfiles contained in uncompressed Zip files. This functionality is meant for reading archival data, so writing to these Zip files is not supported. Using the Info-ZIP
zip
utility, a Zip file can be created by runningzip -r0 ../dirfile.zip *
from within the root of an existing Dirfile. All encoding schemes are supported by this functionality except for the two encoding schemes that already use Zip files, zzip and zzslim. The encoding scheme must be specified using the /ENCODING directive, even if the Dirfile is unencoded. For /INCLUDE directives and LINTERP field look up table files, only relative paths are supported and only without./
and../
syntax.Although Zip files are most commonly created using Deflate compression, the Zip standard (ISO/IEC 21320-1) also supports Store compression, i.e., no compression at all. GetData's Zip file support requires Store compression for all data files, although either Store compression or Deflate compression can be used for any format files or any LINTERP field look up table files. With Store compression, a Zip file effectively concatenates a Dirfile's individual files together into a single file. Since a Zip file contains an offset table, unlike a tarball, random reads are supported without the need to load the entire file from disk.
Documentation patch