rogersce / cnpy

library to read/write .npy and .npz files in C/C++
MIT License
1.34k stars 301 forks source link

Read/write gzip compressed files #6

Open eddelbuettel opened 11 years ago

eddelbuettel commented 11 years ago

I wrapped CNPY into RcppCNPy, an R package to read/write NumPy files. I put one extension in: the ability to read and write gzip-compressed files (but I only do npy files, not npz files).

If there was interest, I'd be happy to provide a pull request.

My code is now in this github repo.

WilliamTambellini commented 7 years ago

When trying to open a compressed npz/npy (generated by np.savez_compressed()) with cnpy I indeed got an exception: cnpy.cpp:92: void cnpy::parse_npy_header(FILE, unsigned int&, unsigned int&, unsigned int&, bool&): Assertion `littleEndian' failed. @eddelbuettel: do you mean you have some patch that would allow cnpy to open/read compressed npz/npy files ? Kind

eddelbuettel commented 7 years ago

Not npz but we handle npy -- as we can just rely on libz to read from a compressed stream. It's all over in the repo I linked to, including some test cases. But it is set up for R users so you may not be able to read everything easily but there is eg tests tests/ directory.

You could call R (ie via Rscript) to have RcppCNPy read the compressed file and write it uncompressed...

WilliamTambellini commented 7 years ago

Hi Dirk, thanks. Carl has found an easy way to support compressed npz/npy and should patch cnpy soon. To be continued. Kind

rogersce commented 7 years ago

I pushed some code to read npz files using libzip, which is now a dependency. William do you mind giving it a go and see if it works for you?

Itd be preferable to use either zlib or libzip, not both, I'll have to look into this a little later

eddelbuettel commented 7 years ago

I don't get to vote on what you do with CNPy ... but I have a weak preference for zlib because R already uses it. Else users of RcppCNPy will have to get libzip-dev to build. Which is probably only a minor nuisance as that should be common, but still "zlib is free" in our use case.

rogersce commented 7 years ago

Dirk, you totally get a vote :) universal preference seems in favor of no extra dependencies, so I'll see if I can retool this to work with zlib

eddelbuettel commented 7 years ago

Out of curiousity, is one of the two "easier" or "cheaper" for you? On Linux it doesn't really matter ...

WilliamTambellini commented 7 years ago

Hi Carl, thank you. I ve just tested and it seems to work. Now, like @eddelbuettel, I would also have a preference for removing the dependency to libzip. Have you considered using some of these pure zlib solutions as for example "miniz": single self contained header, no library, no dependency: https://github.com/richgel999/miniz Cheers W.

rogersce commented 7 years ago

@Dirk: I thought libzip would be much easier, but zlib wasn't too bad bc there was already code to manually parse the zip headers/footers. I'd agree fewer/more common deps are more preferable for the reasons you outlined.

I put up some new code up that will load using zlib, no more dependencies on libzip. I tried a few examples with npy_compressed and it seemed ok. inefficient in how it allocates memory but I will fix that later.

wolfv commented 7 years ago

I just used your code as source of inspiration to read/write NPZ files for xtensor: https://github.com/wolfv/xtensor-io/blob/master/include/xtensor-io/xnpz.hpp

It's got the added ability to compress the ZIP NPY files, and the zip contents modification date is initialized correctly :) With a bit of work it could be adapted for a non-xtensor use case, I guess ...

WilliamTambellini commented 7 years ago

@rogersce Hi Carl, I ve just tested and seems to work. Thank you. In order to track upcoming changes/enhancements, what about adding a version in the sources: for example: const versionMajor = 0; const versionMinor = 1; or creating a 1.0 branch ? Kind

rogersce commented 7 years ago

@wolfv That's great! Glad it was useful. Writing out compressed NPY zips from C++ would be an awesome feature, when I get a chance I'll take a look at your code in depth and try to add that back into cnpy. Of course, if you think it could be done easily and want to submit a pull request, I wouldn't say no to that, either, haha :)