rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.53k stars 297 forks source link

Is indexing with a variable number of indices supported? #865

Open TheButlah opened 3 years ago

TheButlah commented 3 years ago

I'd like to index a tensor along all but the last axis, getting as a result a 1d view of the innermost axis. I thought I'd be able to use ArrayBase::slice() for this, but I need create the index with the s! macro and I can't figure out how to do that with a variable number of indices (such as if the user provides the coordinates of the spatial dimensions in a tensor, and I want to slice on that leaving a 1d view of the feature dimension where the feature dimension is the innermost dimension).

I asked a stack overflow question for this here, if it is indeed supported then perhaps we could add a stack overflow answer there? If not, when would support be added and is there a workaround in the meantime?

The equivalent numpy feature, which is actually slightly more general than my specific use case since I only need a view on the innermost axis, is here

jturner314 commented 3 years ago

At the moment, the best way to accomplish this would be to call index_axis_move or index_axis_inplace for each the axes except for the one you want to keep.

This seems like a common enough task that it would be worth adding .lane(), .lane_mut(), and .lane_move() methods to ArrayBase, like this:

    pub fn lane<I>(&self, axis: Axis, index: I) -> ArrayView1<'_, A>
    where
        S: Data,
        D: RemoveAxis,
        I: NdIndex<D::Smaller>,
    {
        self.view().lane_move(axis, index)
    }

    pub fn lane_mut<I>(&mut self, axis: Axis, index: I) -> ArrayViewMut1<'_, A>
    where
        S: DataMut,
        D: RemoveAxis,
        I: NdIndex<D::Smaller>,
    {
        self.view_mut().lane_move(axis, index)
    }

    pub fn lane_move<I>(self, axis: Axis, index: I) -> ArrayBase<S, Ix1>
    where
        D: RemoveAxis,
        I: NdIndex<D::Smaller>,
    {
        let ax = axis.index();
        let removed_dim = self.dim.remove_axis(axis);
        let removed_strides = self.strides.remove_axis(axis);
        // FIXME: Add better panic handling than `.unwrap()`.
        let offset = index.index_checked(&removed_dim, &removed_strides).unwrap();
        ArrayBase {
            ptr: unsafe { self.ptr.offset(offset) },
            data: self.data,
            dim: Ix1(self.dim[ax]),
            strides: Ix1(self.strides[ax]),
        }
    }

Maybe someone could finish this up (fix the .unwrap() and add docs and tests) and create a PR?

Long term, I'd like to add NumPy's "ellipsis" and "new axis" slicing functionality to ndarray. I haven't worked on it in a while, and it'll be a long time before I have enough time to make significant contributions to ndarray again; I think this is my latest work on it. Someone else could do it, though.

TheButlah commented 3 years ago

Wouldn't index_axis_move cause a copy each time? and id need to iterate over the spatial axes and do that once per axis? That seems prohibitively slow. I also need to not destroy the shape of the original array - I need to be able to get a mutable view on it without consuming the original

jturner314 commented 3 years ago

index_axis_move and index_axis_inplace do not copy the data in the array; they just alter the shape, strides, and pointer to the first element. Making multiple calls to index_axis_move/inplace would be slower than a specially-written implementation like the lanes_move method I proposed in my previous comment simply because the alterations to the shape/strides/pointer would be performed somewhat inefficiently, but unless you're doing this in an inner loop, it shouldn't be too bad. If you want to keep the original array, you could call .view_mut() on the original array to get a mutable view, then call .index_axis_move()/.index_axis_inplace() on the mutable view to index it down to just the 1-D lane you need.

Edit: If you do need to loop over the lanes rather than just index to get a single one, then you should use lanes_mut or genrows_mut to get a producer that's fast to loop over.

bluss commented 3 years ago

This sounds a bit like you'd like to be able to index a lanes producer, and that could be a reasonable feature request in itself.

If it's an iteration - does using Zip with a lanes producer work for this use case? Or what are the limitations there?

In some ways, we need to keep some of our numpy intuitions with this crate - keep operations "vectorized", avoid indexing. Indexing is bounds checked in Rust, and multidimensional bounds checking has some overhead. Ndarray tries to provide traversal features like Zip and the lanes producers, and they should be the best.

TheButlah commented 3 years ago

It wouldn't be iteration over all lanes, only over a single lane (or maybe a subset of lanes). In my specific case (which again is more specific than the actual numpy feature I linked), for a tensor with ndim dimensionality, I have a list of indicies where each index has ndim-1 components. For every index in that list, i want to index the tensor on the outermost dimensions with that index, producing a view onto the innnermost index at that particular coordinate. So conceptually yes, an indexible lanes producer would accomplish what I'm looking for (assuming its performant if I only want to do a single index or a subset of indices), although the more straightforward way in my opinion would be to implement the functionality described in the numpy documentation I linked.

Here's the workaround code I'm using now which is incredibly hacky and probably very inefficient:

for coord in event_coords {
    use ndarray::ArrayViewMut1;
    assert_eq!(coord.len(), self.accumulator.ndim() - 1);

    // TODO: Avoid this match by figuring out how to index on a variable
    // number of dimensions
    let filter_outputs: ArrayViewMut1<DType> = match self.accumulator.ndim() {
        1 => self.accumulator.slice_mut(s![..]),
        2 => self.accumulator.slice_mut(s![coord[0], ..]),
        3 => self.accumulator.slice_mut(s![coord[0], coord[1], ..]),
        4 => self
            .accumulator
            .slice_mut(s![coord[0], coord[1], coord[2], ..]),
        _ => todo!(
            "We currently switch on ndims at runtime, but we \
            ultimately want to not have to do this and instead use \
            slicing with variable numbers of indices. For this reason,
            only common shapes are supported. See \
            https://github.com/rust-ndarray/ndarray/issues/865"
        ),
    };

    todo!("do something with `filter_outputs`")
}