jturner314 / ndarray-npy

.npy and .npz file format support for ndarray
https://docs.rs/ndarray-npy
Apache License 2.0
56 stars 18 forks source link

Writing scalars #64

Closed dabreegster closed 2 years ago

dabreegster commented 2 years ago

Hi,

Thanks for the crate; it's made a port from a Python codebase to Rust much easier!

I'm trying to write a .npy file containing just a scalar u32 to a .npz. https://github.com/dabreegster/ndarray-npy/commit/01b49241510213b4a01de0c7753ae7569b720b19 is my really weak attempt to decipher the numpy format, but this doesn't quite work. On the Python end, it reads in as array(61832, dtype=uint32). The shape is correctly a scalar (), but it's still a numpy.ndarray instead of a numpy.uint32. I'm hardly surprised, since I made total guesses about what the format is.

I'm having trouble reverse engineering the Python code. It looks like it turns everything into something array-like: https://github.com/numpy/numpy/blob/4adc87dff15a247e417d50f10cc4def8e1c17a03/numpy/lib/npyio.py#L713 And then calls: https://github.com/numpy/numpy/blob/4adc87dff15a247e417d50f10cc4def8e1c17a03/numpy/lib/format.py#L671

I'll keep puzzling through this, but in case you happen to have a known answer, it'd be much appreciated. Thanks!

jturner314 commented 2 years ago

The file format is described here. .npy files only represent arrays; there's no way in the .npy format to distinguish between a zero-dimensional array and a scalar.

On the Rust side, you can create a zero-dimensional array or view with arr0() or aview0(), and then write it like any other array using ndarray-npy. This is equivalent to creating an array with numpy.array(5) and then using numpy.save/numpy.savez.

On the Python side, as far as I know, numpy.load always returns an array (for .npy files) or a container of arrays (for .npz files). To convert a zero-dimensional NumPy array into a Python scalar, you can use the .item() method. To convert it into a numpy.uint32, you can use np.uint32(array).

dabreegster commented 2 years ago

Thank you so much, that did the trick -- https://github.com/dabreegster/rampfs/commit/548585925387782d51facafeba292cac6bfdcf15

How would you feel about a tiny addition to the docstring of add_array to the tune of "If you're trying to write a scalar value, see [ndarray::arr]"? I totally missed this when reading the ndarray docs, and I imagine I won't be the last person to get confused about this. I'll send a PR if you think it's a reasonable addition.

jturner314 commented 2 years ago

Fwiw, I'd suggest using aview0() instead of arr0() in this case, to avoid the unnecessary allocation involved in creating an owned array. For example, you could write:

npz.add_array("name", &aview0(&value));

I suppose the difference is pretty negligible, though.

How would you feel about a tiny addition to the docstring of add_array to the tune of "If you're trying to write a scalar value, see [ndarray::arr]"? I totally missed this when reading the ndarray docs, and I imagine I won't be the last person to get confused about this. I'll send a PR if you think it's a reasonable addition.

Sure, a reference in the docs to arr0 and aview0 would be reasonable, especially since those functions aren't used very often for other things. It could be added as a note on the add_array method, and a line using aview0 could be added to the example at the top of the NpzWriter docs.