llohse / libnpy

C++ library for reading and writing of numpy's .npy files
MIT License
361 stars 72 forks source link

Support for reading/writing npz #27

Open WilliamTambellini opened 2 years ago

WilliamTambellini commented 2 years ago

Hello @llohse Would you accept a PR in order to support reading/writing npz (zip of npy files) ?

llohse commented 2 years ago

Thanks for getting in touch and offering to contribute. Generally, I would be happy to add such functionality. However, libnpy currently is fairly compact and also does not have any external dependencies and I would like to keep it that way.

Do you see a way to add npz support without adding too much more code? As for the dependencies, it would need proper preprocessor checks so that npy support does not depend on any zip library.

In any case, feel free to open a PR so that I can have a look.

WilliamTambellini commented 2 years ago

Hi @llohse Tks for the reply. The only realistic way to read/write zip files is to use libz ( " -lz " on Linux) which is a standard lib available on all OS. Usually found with: find_package(ZLIB) eg https://github.com/rogersce/cnpy/blob/4e8810b1a8637695171ed346ce68f6984e585ef4/CMakeLists.txt#L12

Do you see a way to add npz support without adding too much more code?

I dont think it would be "too much" more code depending on how you define "too much": eg: https://github.com/rogersce/cnpy/blob/4e8810b1a8637695171ed346ce68f6984e585ef4/cnpy.cpp#L230

it would need proper preprocessor checks so that npy support does not depend on any zip library

Adding "ifdef HAVE_ZLIB" is ofcourse doable inorder to support npz only if zlib has been found.

feel free to open a PR so that I can have a look.

On our side, we cannot offer to prepare a PR before we are 100% sure you ll accept it so let's discuss further first. Tks.

llohse commented 2 years ago

Hi @WilliamTambellini Thank you for the additional information and thanks again for offering your contribution. I would be glad to make this work.

libz seems "standard" enough. Yet, as far as I can see, it is a C-style library. I try to maintain a modern C++ codebase for libnpy, currently requiring C++14. Adding static analysis to the CI is high on my to-do list. The challenge will be to use zlib without regressions in terms of modern code style.

Moreover, I would like to keep libnpy as lightweight as possible and avoid forcing any hard dependencies, build system, or extra linker arguments by default. How do you feel about an ifdef WITH_NPZ? This could be defined depending on whether ZLIB is found or not by the build system or set manually if not using any build system.

The code examples look simple enough to make it work in a few 100 LOC. That would be fine for me.

What functionality/API do you have in mind? For me, the mvp should support

Regardless, I was planning to add a new API to libnpy, changing the parameters and return values to work with npy_data as defined here https://github.com/llohse/libnpy/blob/master/tests/test-read.cpp#L29. This could be used for npz support, too.

WilliamTambellini commented 1 year ago

Hi @llohse

llohse commented 1 year ago
* C vs C++: there is a cpp api to play with zip archives using boost:
  https://stackoverflow.com/questions/61081287/build-boostiostreams-with-zlib
  but I doubt you d want to add a dep with boost

Yes, I would not want to depend on boost.

* How do you feel about an ifdef WITH_NPZ?
  I proposed "ifdef HAVE_ZLIB" which is required to play with zip files anyway and autoset by many build systems

Sounds good.

* What functionality/API do you have in mind?
  MVP to read npz first, then later to write ?

Sounds good.