rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.6k stars 306 forks source link

Better error message when accidentally use an array instead of tuple for into_shape? #489

Open drewm1980 opened 6 years ago

drewm1980 commented 6 years ago

I just lost an hour to the following incorrect code:

    #[test]
    fn test_ndarray_eq()
    {
        let a = ndarray::arr3(&[[[true]]]);
        let a2 = a.into_shape([1,1]).unwrap();
        let a3 = a2.into_shape([1,1,1]).unwrap();
        assert!(a == a3,"Arrays are not equal!");
    }
error[E0308]: mismatched types
--> src/lib.rs:156:20
156 assert!(a==a2, "Arrays do not match!"); ^^ expected array of 3 elements, found struct ndarray::IxDynImpl
= note: expected type `ndarray::ArrayBase<_, ndarray::Dim<[usize; 3]>>`                                                  
           found type `ndarray::ArrayBase<ndarray::OwnedRepr<bool>, ndarray::Dim<ndarray::IxDynImpl>>`          

The source of the problem has nothing to do with "==", nothing to do with IxDynImpl, and isn't even on the line the error is thrown. If you print all of the arrays using println!, they all look like they have the correct shape, stride, and contained values.

This happened in my original code because I passed the output of .shape() into into_shape. It happened n the above minimized example because I passed the dimensions as arrays "[...]" instead of as tuples "(...)".

If there's anything that can be done to smooth out this sharp edge in the API, consider this a feature request. Failing that, maybe throw big warnings about this footgun in the into_shape() documentation (which says nothing about the type of the input since it's generic over E), and also in the section of the "ndarray for numpy users" section where you at least mention the ugly existence of three different representations of the shape of an array, with different methods to get them.

By the way, thanks for writing ndarray; I would not have even considered Rust without it.

jturner314 commented 6 years ago

I'm sorry to hear about your frustrating experience.

Just for clarity, the solution is to use .dim() or .raw_dim() instead of .shape() when creating an array that should have the same shape as an existing array. (See the np.zeros_like example in ndarray_for_numpy_users for a similar case.)

A few notes regarding the specific example provided in this issue:

Improvement ideas for ndarray:

drewm1980 commented 6 years ago

Hello @jturner314 , thanks for the thorough response!

You seem to be confused about the difference between slices ([T]) and fixed-size arrays ([T; n]). The reason why .into_shape() produces a dynamic-dimensional array when given the output of .shape() is that .shape() returns a slice (type &[usize], or equivalently &[Ix]), and the IntoDimension implementation for slices produces IxDyn.

I was aware of the differences between slices and fixed size arrays, but I am indeed new to the mental overhead of keeping track of static vs. dynamic not only for the array dimensions but for the number of dimensions.

I think the statement:

In ndarray, you can create fixed-dimension arrays, such as Array2. This takes advantage of the type system to help you write correct code and also avoids small heap allocations for the shape and strides.

in:

https://docs.rs/ndarray/0.11/ndarray/doc/ndarray_for_numpy_users/

downplays the amount of mental overhead new ndarray users should be prepared for. Since filing this, I already got bitten by static vs. dynamic number of dimension issues again trying to use data from numpy (via rust-numpy) (which returns arrays of dynamic number of dimensions in the only online example I found).

https://github.com/rust-numpy/rust-numpy/issues/59

Maybe users should go through a tutorial that goes through examples of arrays with static vs. dynamic number of dimensions (and I guess eventually static vs. dynamic dimensions as well?), before trying to attempt ~anything more advanced...

Or maybe make a diagram that shows the full matrix(graph?) of ndarray types with static vs. dynamic dimensions, owned vs. view, iterators, safe and unsafe casting of the contained type (we're currently blocked trying to convert an array from to , for example) and what functions you need to go between them. "into_dimensionality" and "into_dyn" would be arrows opposite directions between two types in your graph. This might be one way of making those more discoverable than scanning through every function in the API. The user can say, "OK, I want to go from this type to this type", find them in the graph, and follow the arrows to figure out what sequence of function calls they need to chain.

Thanks!