Open eddelbuettel opened 11 years ago
When trying to open a compressed npz/npy (generated by np.savez_compressed()) with cnpy I indeed got an exception: cnpy.cpp:92: void cnpy::parse_npy_header(FILE, unsigned int&, unsigned int&, unsigned int&, bool&): Assertion `littleEndian' failed. @eddelbuettel: do you mean you have some patch that would allow cnpy to open/read compressed npz/npy files ? Kind
Not npz but we handle npy -- as we can just rely on libz to read from a compressed stream. It's all over in the repo I linked to, including some test cases. But it is set up for R users so you may not be able to read everything easily but there is eg tests tests/ directory.
You could call R (ie via Rscript
) to have RcppCNPy read the compressed file and write it uncompressed...
Hi Dirk, thanks. Carl has found an easy way to support compressed npz/npy and should patch cnpy soon. To be continued. Kind
I pushed some code to read npz files using libzip, which is now a dependency. William do you mind giving it a go and see if it works for you?
Itd be preferable to use either zlib or libzip, not both, I'll have to look into this a little later
I don't get to vote on what you do with CNPy ... but I have a weak preference for zlib because R already uses it. Else users of RcppCNPy will have to get libzip-dev to build. Which is probably only a minor nuisance as that should be common, but still "zlib is free" in our use case.
Dirk, you totally get a vote :) universal preference seems in favor of no extra dependencies, so I'll see if I can retool this to work with zlib
Out of curiousity, is one of the two "easier" or "cheaper" for you? On Linux it doesn't really matter ...
Hi Carl, thank you. I ve just tested and it seems to work. Now, like @eddelbuettel, I would also have a preference for removing the dependency to libzip. Have you considered using some of these pure zlib solutions as for example "miniz": single self contained header, no library, no dependency: https://github.com/richgel999/miniz Cheers W.
@Dirk: I thought libzip would be much easier, but zlib wasn't too bad bc there was already code to manually parse the zip headers/footers. I'd agree fewer/more common deps are more preferable for the reasons you outlined.
I put up some new code up that will load using zlib, no more dependencies on libzip. I tried a few examples with npy_compressed and it seemed ok. inefficient in how it allocates memory but I will fix that later.
I just used your code as source of inspiration to read/write NPZ files for xtensor: https://github.com/wolfv/xtensor-io/blob/master/include/xtensor-io/xnpz.hpp
It's got the added ability to compress the ZIP NPY files, and the zip contents modification date is initialized correctly :) With a bit of work it could be adapted for a non-xtensor use case, I guess ...
@rogersce Hi Carl, I ve just tested and seems to work. Thank you. In order to track upcoming changes/enhancements, what about adding a version in the sources: for example: const versionMajor = 0; const versionMinor = 1; or creating a 1.0 branch ? Kind
@wolfv That's great! Glad it was useful. Writing out compressed NPY zips from C++ would be an awesome feature, when I get a chance I'll take a look at your code in depth and try to add that back into cnpy. Of course, if you think it could be done easily and want to submit a pull request, I wouldn't say no to that, either, haha :)
I wrapped CNPY into RcppCNPy, an R package to read/write NumPy files. I put one extension in: the ability to read and write gzip-compressed files (but I only do npy files, not npz files).
If there was interest, I'd be happy to provide a pull request.
My code is now in this github repo.