marcoheisig / numpy-file-format

Read and write Numpy .npy and .npz files.
MIT License
16 stars 4 forks source link

Allow loading/storing arrays to streams, and not just files. #8

Open sirherrbatka opened 2 years ago

sirherrbatka commented 2 years ago

As in the title. Streams are useful for network and stuff.

marcoheisig commented 2 years ago

Yes, good point. I'll work on this once I have some free time to spare. Thanks for the suggestion!

sirherrbatka commented 2 years ago

Do you accept pull requests?

marcoheisig commented 2 years ago

Sure, absolutely :)

sirherrbatka commented 2 years ago

Ok, so I took a look at how this code actually operates. This bit looks somewhat weird to me: ;; We actually open the file twice, once to read the metadata - one byte ;; at a time, and once to read the array contents with a suitable element ;; type (e.g. (unsigned-byte 32) for single precision floating-point ;; numbers). load-array-metadata seems to read only the file header, so it should leave the input stream at the position when the content of the array actually begins. No need to open the file twice, or rewind the stream for that matter. This is important because by opening file only once we will allow named pipes on UNIX os work fine which is rather handy for python/lisp interop (reduces IO overhead). I will rewrite this and test with numpy.

sirherrbatka commented 2 years ago

Yeah, it seems to work fine. I made a commit that implements required changes for the load-array https://github.com/sirherrbatka/numpy-file-format/commit/340c36ddfdef54c9619dab259371745de1e1bd8b notice that I swapped your custom code for nibbles. Nibbles happens to already have handling for byte-orders which is a good thing I belive. Let me know if this is acceptable, and if so, I will proceed further with signed/unsigned bytes and store side of the system.

marcoheisig commented 2 years ago

Yes, nibbles is definitely an improvement for what I did. I tried to avoid additional dependencies because I also use this code in cl4py, which doesn't necessarily have access to Quicklisp. But cl4py has its own copy of numpy-file-format, so we can (and should) use nibbles here.

sirherrbatka commented 2 years ago

Very well. I've opened pull request. It should be possible to also account for different endianness in the store-array (I think that the old code was invalid, btw, it would not function right on the big endian machines).

sirherrbatka commented 2 years ago

Hey, could you please take a look at my merge request now?