rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.62k stars 307 forks source link

Best Practice: ArrayView vs Pointer #1059

Open JP-Ellis opened 3 years ago

JP-Ellis commented 3 years ago

I was wondering whether there was a best practive when writing functions that expect arrays, specifically when they are not taking ownership of the array. I couldn't see this being addressed in the documentation or in an existing issue so here's a few things I think could be worth thinking about:

  1. Should functions prefer to take ArrayView over the equivalent Array, and what about using a pointer vs by value? For example, which of the following would be 'best'?

    fn sum(a: ArrayView1<f64>) -> f64;
    fn sum(a: &ArrayView1<f64>) -> f64;
    fn sum(a: &Array1<f64>) -> f64;
  2. When muteable data is expected, is there a preference to using ArrayViewMut vs &mut Array?

    fn increment(mut a: ArrayViewMut1<f64>) -> f64;
    fn increment(a: &mut Array1<f64>) -> f64;

I would expect that for most applications, the ArrayView and ArrayViewMut are preferred as they are more general.

Is the fact that they are views into an array mean that they behave more like pointers and thus &ArrayView is redundant?

multimeric commented 3 years ago

I'm just a user, not a dev, but here's my take.

Firstly you should understand that an ArrayView is not just a pointer to an array. It's a pointer to some data (which may or may not be a full Array), but possibly using a subset of that data, and possibly with a different stride length, meaning that it could be a view over say every second element of the array. This information is stored within the ArrayBase struct.

My second general point is that, since all array subtypes are just ArrayBase with different type signatures, you can and should make your functions generic using these and almost never require any particular array type.

Now to your questions:

Should functions prefer to take ArrayView over the equivalent Array

Neither. As mentioned above, if your function doesn't care what type of array it takes, it should be made generic over ArrayBase<S, D>. I can't think of any functions that should require an ArrayView, especially because I believe all array types implement view() to convert them in a recommended way to an ArrayView if you somehow need one (e.g. the split_at() function is only implemented for ArrayView for some reason).

what about using a pointer vs by value

Follow Rust best practices here, and use the lowest level of ownership you can. If your function doesn't mutate the array, just accept a borrowed ArrayBase. In your example this would be:

fn sum<A, S, D>(a: &ArrayBase<S, D>) -> A
where
    S: Data<Elem = A>,
    D: Dimension;

And indeed this is basically how it's implemented in ndarray, albeit inside an impl block: https://github.com/rust-ndarray/ndarray/blob/87506b8bec7cc9a265289beaa56d0b1b58ffad4f/src/numeric/impl_numeric.rs#L19-L49

When muteable data is expected, is there a preference to using ArrayViewMut vs &mut Array?

Remember, if your function takes an ArrayViewMut, the caller has to transfer ownership of the whole View struct to your function, and can no longer use it. This means throwing away all the subset/stride information. Again, make your functions generic over &ArrayBase. If only the contents of the array need to be mutated, make it generic over ArrayBase<S, D> where S: DataMut. Only use a mutable reference if you need to edit the array itself, which I can't think of many use cases for. Maybe if you want to reshape it?

I would expect that for most applications, the ArrayView and ArrayViewMut are preferred as they are more general.

As explained, no, making it generic over ArrayBase with the appropriate trait is preferred as it's more general again.

Is the fact that they are views into an array mean that they behave more like pointers and thus &ArrayView is redundant?

No. See my first paragraph.

JP-Ellis commented 3 years ago

Thanks for your detailed answer :smile: I will refactor some of my code to be a bit more general and use ArrayBase.

Perhaps there should be some part of the documentation that addresses this to make it clearer? Or if that exists somewhere, I didn't see it.

Note that whether I pass in ArrayViewMut into a function or pass a mutable pointer to it, Rust should prevent other parts of the code from accessing it elsewhere (even for reading). So that's why I thought it wouldn't matter too much. I agree that the mutable reference would only ever be needed if I wanted to chnage the underlying array and generally modifying data should only require a mutable view into the array (through DataMut).

multimeric commented 3 years ago

Note that whether I pass in ArrayViewMut into a function or pass a mutable pointer to it, Rust should prevent other parts of the code from accessing it elsewhere (even for reading). So that's why I thought it wouldn't matter too much.

This is true, it's not the safety that makes this unadvisable, it's the fact that you're requesting more ownership in your function than you need, and users will be annoyed that they have to re-create the view in their code since your function consumes it.

jturner314 commented 3 years ago

879 has some discussion about this question and a proposal to improve ndarray's API in this respect. (See "Good way to express function parameters" under "Motivation".) With ndarray's current API, I pretty much agree with @multimeric, with the caveat that monomorphization bloat is also a concern, as mentioned in #879. To avoid this issue, it's probably best to use &ArrayBase<S, D> as the parameter type for the public function, but have the body of the public function immediately call a private, inner function which operates on ArrayView<A, D>. That gets pretty tedious to write, though. As a result, for my own private code, I often have functions take ArrayView/Mut parameters just to keep things simple.

JP-Ellis commented 3 years ago

@jturner314 Thanks for the link to the RFC! This looks like a very promising change for the next major release of ndarray.

akern40 commented 1 month ago

This is a great question and, as jturner314 had pointed out, will be addressed by resolving #879. As a result, this will be closed by #1440 when it lands.