Closed RunDevelopment closed 7 months ago
By "indexed", do you mean "not the complete array, just a portion of it"?
In general, iter
is slow in ndarray. You might want to test to_owned
then to_vec
and tell us about the performance. In all cases, 3s seems really high!
By "indexed", do you mean "not the complete array, just a portion of it"?
Sorry, I meant a sliced array. E.g. the cropped image array from my "benchmark" was created in python like this: img[:, 1:, ...]
.
You might want to test
to_owned
thento_vec
and tell us about the performance.
Could you please explain this in more detail? Changing my above code to:
let flat: Vec<f32> = ndarray.to_owned().to_vec().unwrap();
Gives me a NotContiguousError
at runtime.
So I tried using the to_vec
method from ArrayBase
. I also had to reshape the array since I'm working with 2D/3D images and to_vec
is only available for 1D arrays. The following code did work correctly:
let flat: Vec<f32> = ndarray
.as_array()
.to_owned()
.into_shape(ndarray.shape().iter().product::<usize>())
.unwrap()
.to_vec();
This clocked in at 1.1sec, which is still 6x slower than slice.to_vec()
. The break-down is that to_owned
takes 0.95sec and to_vec
takes ~0.15sec. So copying from contiguous memory into a vec is as fast as ever, but to_owned
is very slow.
I did some further testing and the main reason to_owned
is so slow seems to be that my array uses IxDyn
. Using .into_dimensionality()
to "cast" the array to use Ix3
speeds up to_owned
from 0.95sec to 0.25sec (4x faster). Doing this "cast" also sped up my old .iter().copied().collect()
to take 0.4sec, so iteration is 8-9x faster.
So arrayBase.iter()
should probably do something similar as an internal optimization.
I also found the .into_raw_vec()
method which gets rid of the copying .to_vec()
does. With the into_dimensionality()
optimization, this brings it down to 0.25sec total, which seems to be the best that is possible right now.
If you do assume a dimensionality of 3, you could also change your Python API to expect PyReadonlyArray3<f32>
instead.
If you do assume a dimensionality of 3
I don't. I just simplified my findings for the sake of brevity. In my code, I match
on array.shape().len()
to support 2D and 3D arrays (we use 2D arrays for grayscale images in the software I work with).
You might still be better of accepting an enum of PyReadonlyArray2<f32>
or PyReadonlyArray3<f32>
, c.f. https://github.com/PyO3/rust-numpy/blob/main/examples/simple/src/lib.rs#L113
Since I figured out the currently fastest way to get a vec, I consider this issue closed now.
Should I make a new issue for the IxDyn
performance problem?
Should I make a new issue for the IxDyn performance problem?
Personally, I am not sure I consider it a problem as the iterator is most likely not the only place where IxDyn
is measurably faster than Ix?
and I would like to avoid trying to cushion all of those places out with small-dimensionality fast paths. I think the main learning is to use the fixed-dimensionality types if that is at all possible.
Should I make a new issue for the IxDyn performance problem?
If you do, please explain clearly and with at least one code example.
I think the main learning is to use the fixed-dimensionality types if that is at all possible.
If you want to teach people how to use ndarrays most efficiently, I would suggest writing documentation or changing the API to make IxDyn
less convenient to use. Right now, IxDyn
is the closest thing to how ndarrays behave in python, so I would argue that people will naturally gravitate towards it. I certainly did.
Hi! I wanted to know what's the fastest way to convert a
PyReadonlyArrayDyn<f32>
into aVec<f32>
.I'm making python bindings for an image processing programs, and converting numpy arrays into a contiguous memory format is the first step. I'm already using
ndarray.as_slice()
which allows me to skip the conversion to vec. This issue is about indexed ndarrays. I'm currently using the following code for the conversion:Compared to a
slice.to_vec()
, this is more than 10x slower. Running this code on a (4320, 8468, 4) ndarray takes 0.2sec if the array is a slice (so it runsslice.to_vec()
) and 3sec when the array is indexed (I cropped off 1px on the left side).This is a huge performance cliff for me, so I would like to know what I can do to fix this.