ndarray to/from numpy bytes

ntakouris commented 2 years ago

Consider the following message definition:

#[pyclass]
#[derive(Debug, PartialEq, Deserialize, Serialize, Clone)]
pub struct Image {
    #[pyo3(get, set)]
    #[serde(with = "serde_bytes")]
    pub ndarray: Vec<u8>,

    #[pyo3(get, set)]
    #[serde(with = "serde_bytes")]
    pub shape: Vec<u8>,
}

From python, this is the way it can be used to store and read images:

Image(
        ndarray=np.random.randint(
            0, 255, size=(1920, 1080, 3), dtype=np.uint8
        ).tobytes(),
        shape=np.array([1920, 1080, 3], dtype=np.uint32).tobytes()
        )

    image_arr = np.frombuffer(bytearray(image.ndarray), dtype=np.uint8).reshape(shape_arr)
    shape_arr = np.frombuffer(bytearray(image.shape), dtype=np.uint32)

After some googling and searching around docs and issues, I can't find a way to parse those byte arrays / Vect<u8>s as an ndarray.

jturner314 commented 2 years ago

To go from Vec<A> to Array<A, D> with a given shape, you can use Array::from_shape_vec().

To go from Vec<u8> to Vec<T>, you can use something like the byteorder crate.

If you just want files which are easily interoperable between NumPy and ndarray, you may be interested in numpy.save()/numpy.load() and the ndarray-npy crate.

ntakouris commented 2 years ago

@jturner314 Thanks for the quick response.

There is no way of loading some bytes saved in a Vec<u8> rust field? I don't save to a file, and I definitely can avoid the whole numpy.save('x.npy') function. I am instead using .tobytes() for an image HWC array in python (np.uint8) and also saving the shape as another numpy array (np.array([h, w, c], dtype=np.uint32)), storing it's bytes in Vec<u8>. It's fine to use with python just by doing np.frombuffer(arr_bytes, dtype=np.uint8) for the image, and dtype=np.uint32 for the shape.

Is there any functionality to do just that? You can assume that arrays are C-style contiguous.

jturner314 commented 2 years ago

Fwiw, the numpy crate may be of interest, especially since it looks like you're using PyO3 already anyway. I've used it in a few projects for calling back-and-forth between Rust and Python with NumPy/ndarray arrays. With the numpy crate, you can write Rust functions which accept PyReadOnlyArray* (NumPy array) parameters and return PyArray* (NumPy array) values, with easy conversion to/from ndarray arrays.

Back to your specific question – something like the make_array_3d function below is one way to do it:

use std::mem;
use byteorder::{ByteOrder, NativeEndian};
use ndarray::prelude::*;

/// Assumes that `shape_bytes` is the shape encoded with native endianness.
///
/// # Panics
///
/// Panics if `shape_bytes.len() != 12` or if any of the axis lengths
/// overflows `usize`.
///
/// # Errors
///
/// Errors if the shape does not correspond to the number of elements in `data`
/// or if the shape/strides would result in overflowing `isize`.
fn make_array_3d(data: Vec<u8>, shape_bytes: Vec<u8>) -> Result<Array3<u8>, ndarray::ShapeError> {
    let shape: [usize; 3] = {
        let mut shape_u32: [u32; 3] = [0; 3];
        NativeEndian::read_u32_into(&shape_bytes, &mut shape_u32);
        shape_u32.map(|axis_len| usize::try_from(axis_len).expect("overflow converting axis length to usize"))
    };
    Array3::from_shape_vec(shape, data)
}

fn main() {
    let data: Vec<u8> = vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11];
    let shape: Vec<u32> = vec![2, 2, 3];
    let mut shape_bytes: Vec<u8> = vec![0; shape.len() * mem::size_of::<u32>()];
    NativeEndian::write_u32_into(&shape, &mut shape_bytes);

    let arr = make_array_3d(data, shape_bytes).unwrap();
    assert_eq!(
        arr,
        array![
            [[0, 1, 2], [3, 4, 5]],
            [[6, 7, 8], [9, 10, 11]],
        ],
    );
}

The tricky part is converting the bytes encoding the shape to [usize; 3] – it's important to consider the endianness. (I assumed that the bytes represented native-endian u32 values in the example above.)

Fwiw, I do think it would be worth adding methods to ndarray to simplify conversions to/from byte buffers.

ntakouris commented 2 years ago

And the same read_XX_into needs to happen if the dtype of the data array is different. It would be handy indeed. Most of the machines I deal with (x86-x64-Jetson (arm)-Cuda) are little-endian by default, while some arm processors are also bi-endian, so the native endianess might never be a big issue for many users.

rust-ndarray / ndarray

ndarray to/from numpy bytes #1147