Slow iteration because of `IxDyn`

As I described in https://github.com/rust-ndarray/ndarray/issues/1339, an array with IxDyn has 10x slower iteration performance than an equivalent array using a fixed-size index. This has wide-reaching implications, as this means that many pixel-wise operations are substantially slower.

Example: Let n be an ndarray with the shape (4320, 8468, 4).

let n: ArrayViewD<f32>; // uses IxDyn 

// iter()

// slow: takes 3sec on my machine
let _: Vec<f32> = n.iter().cloned().collect();

// fast: takes 0.4sec on my machine
let n3: ArrayView3<f32> = n.into_dimensionality().unwrap();
let _: Vec<f32> = n3.iter().cloned().collect();

// to_owned()

// slow: takes 0.95sec on my machine
let _ = n.to_owned();

// fast: takes 0.25sec on my machine
let n3: ArrayView3<f32> = n.into_dimensionality().unwrap();
let _ = n3.to_owned();

To improve the performance of arrays using IxDyn, I suggest optimizing iteration for these arrays. Since we can see that using fixed-sized indexes is substantially faster, I suggest internally "casting" the array to a fixed-size index (or similar) before iteration when possible.

rust-ndarray / ndarray

Slow iteration because of `IxDyn` #1340

979 helps for this case, even if it was developed with the general case (any iterator) in mind. ndarray does also already in several cases have fast paths for contiguous arrays, also if they are dynamic dimensional.