Better error message when accidentally use an array instead of tuple for into_shape?

drewm1980 commented 6 years ago

I just lost an hour to the following incorrect code:

    #[test]
    fn test_ndarray_eq()
    {
        let a = ndarray::arr3(&[[[true]]]);
        let a2 = a.into_shape([1,1]).unwrap();
        let a3 = a2.into_shape([1,1,1]).unwrap();
        assert!(a == a3,"Arrays are not equal!");
    }

error[E0308]: mismatched types
--> src/lib.rs:156:20 156 assert!(a==a2, "Arrays do not match!"); ^^ expected array of 3 elements, found struct ndarray::IxDynImpl
= note: expected type `ndarray::ArrayBase<_, ndarray::Dim<[usize; 3]>>`                                                  
           found type `ndarray::ArrayBase<ndarray::OwnedRepr<bool>, ndarray::Dim<ndarray::IxDynImpl>>`          

The source of the problem has nothing to do with "==", nothing to do with IxDynImpl, and isn't even on the line the error is thrown. If you print all of the arrays using println!, they all look like they have the correct shape, stride, and contained values.

This happened in my original code because I passed the output of .shape() into into_shape. It happened n the above minimized example because I passed the dimensions as arrays "[...]" instead of as tuples "(...)".

If there's anything that can be done to smooth out this sharp edge in the API, consider this a feature request. Failing that, maybe throw big warnings about this footgun in the into_shape() documentation (which says nothing about the type of the input since it's generic over E), and also in the section of the "ndarray for numpy users" section where you at least mention the ugly existence of three different representations of the shape of an array, with different methods to get them.

By the way, thanks for writing ndarray; I would not have even considered Rust without it.

jturner314 commented 6 years ago

I'm sorry to hear about your frustrating experience.

Just for clarity, the solution is to use .dim() or .raw_dim() instead of .shape() when creating an array that should have the same shape as an existing array. (See the np.zeros_like example in ndarray_for_numpy_users for a similar case.)

A few notes regarding the specific example provided in this issue:

test_ndarray_eq doesn't compile, but it doesn't compile because a is moved when creating a2 (so it can't be used in the assert!), not because any of the arrays are dynamic-dimensional. In other words, the provided error message doesn't correspond to the provided example code.
All of the arrays in the example are fixed-dimensional, since arr3() produces a fixed-dimensional array and each of the .into_shape() calls is passed a fixed-size array as the shape. You seem to be confused about the difference between slices ([T]) and fixed-size arrays ([T; n]). The reason why .into_shape() produces a dynamic-dimensional array when given the output of .shape() is that .shape() returns a slice (type &[usize], or equivalently &[Ix]), and the IntoDimension implementation for slices produces IxDyn. In contrast, the example uses fixed-size arrays of type [usize; 2] and [usize; 3] in its .into_shape() calls.
A little Rust tip: assert_eq!(a, b) and assert_ne!(a, b) are more convenient than assert!(a == b) or assert!(a != b) because they print the debug representations of a and b when the assertion fails instead of just indicating whether the assertion fails or not.

Improvement ideas for ndarray:

Make the Debug implementation of ArrayBase indicate whether the dimension is fixed-size or dynamic so that printing with println!("{:?}", arr) clearly shows the difference.
Add warnings to the documentation (in the .shape() docs as well as ndarray_for_numpy_users). Users coming from NumPy need to be especially aware that ndarray has both fixed-dimensional and dynamic-dimensional arrays. We can add a warning to .into_shape() as well, but note that many methods have the same behavior (taking a shape with IntoDimension), not just .into_shape().
Make .shape() return a reference to an associated type of Dimension, so arr3.shape() would return &[usize; 3], while arrdyn.shape() would return &[usize]. I've wanted to add Shape and Strides associated types to the Dimension trait for a while but just haven't gotten around to it yet. (This would be a fairly major change that would touch lots of places in the code.) I'd like for the types that implement Dimension to be zero-size, and use Shape/Strides/Index associated types on Dimension for any representations. This would also mean that we'd no longer need the .raw_dim() and .dim() methods; .shape() would be sufficient.
Even without making the change in the previous item, we can remove the .dim() method and Dimension::Pattern associated type, now that Rust (since 1.26) can pattern-match on slices as well as tuples.
We can implement PartialEq for more pairs of types, including fixed-dimensional to dynamic-dimensional arrays.

drewm1980 commented 6 years ago

Hello @jturner314 , thanks for the thorough response!

You seem to be confused about the difference between slices ([T]) and fixed-size arrays ([T; n]). The reason why .into_shape() produces a dynamic-dimensional array when given the output of .shape() is that .shape() returns a slice (type &[usize], or equivalently &[Ix]), and the IntoDimension implementation for slices produces IxDyn.

I was aware of the differences between slices and fixed size arrays, but I am indeed new to the mental overhead of keeping track of static vs. dynamic not only for the array dimensions but for the number of dimensions.

I think the statement:

In ndarray, you can create fixed-dimension arrays, such as Array2. This takes advantage of the type system to help you write correct code and also avoids small heap allocations for the shape and strides.

in:

https://docs.rs/ndarray/0.11/ndarray/doc/ndarray_for_numpy_users/

downplays the amount of mental overhead new ndarray users should be prepared for. Since filing this, I already got bitten by static vs. dynamic number of dimension issues again trying to use data from numpy (via rust-numpy) (which returns arrays of dynamic number of dimensions in the only online example I found).

https://github.com/rust-numpy/rust-numpy/issues/59

Maybe users should go through a tutorial that goes through examples of arrays with static vs. dynamic number of dimensions (and I guess eventually static vs. dynamic dimensions as well?), before trying to attempt ~anything more advanced...

Or maybe make a diagram that shows the full matrix(graph?) of ndarray types with static vs. dynamic dimensions, owned vs. view, iterators, safe and unsafe casting of the contained type (we're currently blocked trying to convert an array from to , for example) and what functions you need to go between them. "into_dimensionality" and "into_dyn" would be arrows opposite directions between two types in your graph. This might be one way of making those more discoverable than scanning through every function in the API. The user can say, "OK, I want to go from this type to this type", find them in the graph, and follow the arrows to figure out what sequence of function calls they need to chain.

Thanks!

rust-ndarray / ndarray

Better error message when accidentally use an array instead of tuple for into_shape? #489