rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.63k stars 307 forks source link

Improved arithmetic operations for arrays #699

Open bluss opened 5 years ago

bluss commented 5 years ago

(This is a collaborative issue, please edit and add points, if applicable, or join the discussion in the issue below)

For context, please read the ArrayBase documentation on Arithmetic Operations first.

Goals

Non-Goals

Parallelization and multithreading is not in scope for this issue

Prioritization

Known Problems in Current Implementation

Expanding the Number of Implementations

Which solution is better for compile time?

A. Make impl blocks more generic (admitting more array kinds per impl, for example admitting both &A and &mut A) B. Expand the number of impls to cover all cases (for example, one for each combination of &A/&mut A)

Consider both plain ndarray "cargo build" compile time, and compile time when ndarray is used in a project and compiles to non-generic code.

Co-broadcasting for Dynamic dimensionality

For static dimensionality array operations we use right hand side broadcasting: in A @ B, we can attempt to broadcast B to the shape of A.

For dynamic dimensionality operations, we can improve this to co-broadcasting so that A @ B can result in an array with a shape that's neither that of A or B.

Note: Co-broadcasting only ever expands the number of arrays that are compatible in operations, it does not change the result of operations that are already permitted by the right hand side broadcasting rule.

Related issues:

bluss commented 5 years ago

About compile time

Remember: When working on the crate, compile time of the ndarray crate itself can seem “pretty good”. It's mostly generic code, we don't generate the machine code here, and we avoid the slowest part of compiling a rust program — feeding all the code to LLVM! The compile time when all the generic functionality is instantiated in the user's project might be quite something different. (This also happens in our tests).

There are some strategies to keep in mind for codegen compilation time:

  1. Split code so that the main parts of it depend on as few generic parameters as possible (with few possible instantiations)
  2. Reduce the space of possible type parameter values up front. a. Rely on coercions to support multiple types while implementing for a lesser number of them b. If using argument conversion traits, convert and then call a less generic function/method with the result of the conversion. (Example: AsRef etc).

But then there's the question of "Rust" focused compilation time:

  1. As asked before - is it better with a few more general impls or many more specific ones?

Looking at ndarray's -Ztime-passes without incremental compilation, the following items use the most time:

  time: 0.338; rss: 176MB       coherence checking
  time: 8.565; rss: 180MB       wf checking
  time: 0.238; rss: 180MB       item-types checking
  time: 4.654; rss: 198MB       item-bodies checking
    time: 0.489; rss: 198MB     rvalue promotion + match checking
    time: 0.044; rss: 198MB     liveness checking + intrinsic checking
  time: 0.532; rss: 198MB       misc checking 2
  time: 0.000; rss: 198MB       borrow checking
  time: 4.346; rss: 229MB       MIR borrow checking
  time: 0.952; rss: 245MB       metadata encoding and writing
  time: 0.508; rss: 291MB       codegen
  time: 0.497; rss: 291MB       LLVM passes
time: 21.672; rss: 258MB                total

So it is indeed a mostly "Rust" bound compilation with little time in code generation. And we have a possibility that the Rust bound compilation passes will be proportional to the amount of new methods we define. I'd wager it would be good with fewer, more generic trait impls for this reason?

(And then apply tips 1 and 2 to decrease compile time in the end user application.)

LukeMathWalker commented 5 years ago

I agree on the goals and the outlined pain points/strategy.

A couple of points I'd like to add to the discussion:

bluss commented 5 years ago

478 is related because it contains the idea of treating an array of Cell<T> elements as something that's mutable, for example that you can have on the left hand side of the += operator.

While this is a load of factors to consider, we should not fix this in one pull request! We should understand the landscape, talk about plans and make gradual fixes.

jturner314 commented 5 years ago

In #744, I've proposed relaxing all S: DataOwned constraints to S: Data in the arithmetic ops and using .into_owned() to get an owned array to work on. This approach is no more expensive than the current version of ndarray. However, it does have the same "Excessive copying of the whole array." issue @bluss mentioned in the case where the LHS is not uniquely owned. Once we have a solution for that issue for &A @ &A, we can reuse it in the A @ &A impl. (We can add an .is_uniquely_owned() method to Data, which will determine whether to mutate the LHS in-place or create a brand new array without copying the LHS.)

jturner314 commented 5 years ago

One other comment on this issue -- we can implement co-broadcasting for arbitrary dimension types, not just IxDyn, with something like this:

pub trait PartialOrdDim<Rhs: Dimension>: Dimension {
    type Max: Dimension;
    type Min: Dimension;
    // possibly other useful stuff
}

impl<D: Dimension> PartialOrdDim<Ix3> for Ix2 {
    type Max = Ix3;
    type Min = Ix2;
}

impl<D: Dimension> PartialOrdDim<IxDyn> for Ix2 {
    type Max = IxDyn;
    type Min = IxDyn;
}

// ...

impl<'a, A, B, S, S2, D, E> $trt<&'a ArrayBase<S2, E>> for &'a ArrayBase<S, D>
where
    A: Clone + $trt<B>,
    B: Clone,
    S: Data<Elem=A>,
    S2: Data<Elem=B>,
    D: Dimension + PartialOrdDim<E>,
    E: Dimension,
{
    type Output = Array<<A as $trt<B>>::Output, <D as PartialOrdDim<E>>::Max>;
    fn $mth(self, rhs: &'a ArrayBase<S2, E>) -> Self::Output {
        // ...
    }
}
djc commented 4 years ago

FWIW, I'm just getting started with ndarray and basically didn't figure out how to do AddAssign with two 1-dimensional f32 arrays:

error[E0271]: type mismatch resolving `<ViewRepr<&mut f32> as RawData>::Elem == ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
  --> tools/src/bin/build-index.rs:58:26
   |
58 |                 sentence += ArrayView1::from(&state.buf);
   |                          ^^ expected `f32`, found struct `ArrayBase`
   |
   = note: expected type `f32`
            found struct `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
   = note: required because of the requirements on the impl of `AddAssign<ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>>` for `ArrayBase<ViewRepr<&mut f32>, Dim<[usize; 1]>>`

error[E0277]: the trait bound `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>: ScalarOperand` is not satisfied
  --> tools/src/bin/build-index.rs:58:26
   |
58 |                 sentence += ArrayView1::from(&state.buf);
   |                          ^^ the trait `ScalarOperand` is not implemented for `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
   |
   = note: required because of the requirements on the impl of `AddAssign<ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>>` for `ArrayBase<ViewRepr<&mut f32>, Dim<[usize; 1]>>`

error[E0271]: type mismatch resolving `<ViewRepr<&f32> as RawData>::Elem == ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
  --> tools/src/bin/build-index.rs:58:26
   |
58 |                 sentence += ArrayView1::from(&state.buf);
   |                          ^^ expected `f32`, found struct `ArrayBase`
   |
   = note: expected type `f32`
            found struct `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
   = note: required because of the requirements on the impl of `AddAssign` for `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
   = note: required because of the requirements on the impl of `AddAssign<ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>>` for `ArrayBase<ViewRepr<&mut f32>, Dim<[usize; 1]>>`

error[E0277]: the trait bound `ViewRepr<&f32>: DataMut` is not satisfied
  --> tools/src/bin/build-index.rs:58:26
   |
58 |                 sentence += ArrayView1::from(&state.buf);
   |                          ^^ the trait `DataMut` is not implemented for `ViewRepr<&f32>`
   |
   = help: the following implementations were found:
             <ViewRepr<&'a mut A> as DataMut>
   = note: required because of the requirements on the impl of `AddAssign` for `ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>`
   = note: required because of the requirements on the impl of `AddAssign<ArrayBase<ViewRepr<&f32>, Dim<[usize; 1]>>>` for `ArrayBase<ViewRepr<&mut f32>, Dim<[usize; 1]>>`

Not sure if that's related to this issue, but I looked for examples doing this kind of thing and a bunch of the documentation on ArrayBase but failed to figure it out for now (and fell back to simple loops).

mulimoen commented 4 years ago

@djc You might need to make a reference to ArrayView:

sentence += &ArrayView1::from(&state.buf)
bluss commented 3 years ago

The docs cover this in the designated section, with examples https://docs.rs/ndarray/0.14.0/ndarray/struct.ArrayBase.html#arithmetic-operations