rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.55k stars 299 forks source link

Equivalent for np.pad #823

Closed Dimev closed 1 year ago

Dimev commented 4 years ago

Is there an equivalent for np.pad

I'm trying to make a filter for arrays using windows, however the output array is smaller than the input array, which is something I want to avoid. I think this can be solved by padding the input array to be a bit larger before using it.

My code:

let input = Array2::<f32>::zeros((300, 200));
let kernel = array![[0.0625, 0.125, 0.0625], [0.125, 0.25, 0.125], [0.0625, 0.125, 0.0625]];
let output = ndarray::Zip::from(input.windows(kernel.raw_dim()))
    .apply_collect(|window| {
        (&window * &kernel).sum()
    });

assert_eq!(input.shape(), output.shape()); // this fails

I'm new to both rust and ndarray, so there might be an obvious solution I'm missing.

imjasonmiller commented 4 years ago

Hi, did you figure this out? I'm looking for this as well and I feel as if I might not be looking hard enough.

Although I am not that familiar with Python, I've taken a look at the numpy implementation and that seems doable at first glance.

If no such functionality exists and is desired, I'd like to give implementing it a shot, despite being quite new to Rust as well. Would this be welcome @bluss or @jturner314?

Dimev commented 4 years ago

I haven't found anything so far Another fix I can do is cropping the output a bit so that it has the correct size, but that won't be the perfect solution. Another thing I could do is make a new array the size of the original + padding and somehow copy the data from one to another. Not sure how to do that efficiently using ndarray features.

imjasonmiller commented 4 years ago

I think I have a somewhat similar use case. I was looking for this due to edge handling of my input, e.g. clamp, mirror, wrap and so on. I thought trading an increase in space complexity for a decrease in time complexity by adding padding with values for those edge handling methods would be an interesting trade-off while learning about and trying to speed up my convolutions.

I'm not sure if there is another way to do so, but I'd love to hear about it! I'm currently thinking about implementing your second suggestion. I also wonder if there isn't a more efficient way of handling this.

Dimev commented 4 years ago

I figured out the second suggestion First make a new array (I'm using zeros) that's slightly bigger:

let padded= Array2::<f32>::zeros(Ix2(to_pad.shape()[0] + 0, to_pad.shape()[1] + 2));

then assign the array to be padded to a part of the new array:

padded.slice_mut(s![0..padded.shape()[0] - 0, 1..padded.shape()[1] - 1]).assign(&to_pad);

now you can apply convolution, in my case, edge detection:

let kernel = array![[-1.0, 0.0, 1.0]];
let gradient_x = ndarray::Zip::from(padded.windows(kernel.raw_dim()))
    .apply_collect(|window| {
        (&kernel * &window).sum()
    });

Edit: should work for other cases too. I'm not yet skilled enough to turn this into a reusable function

jturner314 commented 4 years ago

Fwiw, here's a function to pad with zeros for arrays of arbitrarily many dimensions:

use ndarray::{Array, ArrayBase, Axis, Data, Dimension, Slice};
use num_traits::Zero;

/// Pad the edges of an array with zeros.
///
/// `pad_width` specifies the length of the padding at the beginning
/// and end of each axis.
///
/// **Panics** if `arr.ndim() != pad_width.len()`.
fn pad_with_zeros<A, S, D>(arr: &ArrayBase<S, D>, pad_width: Vec<[usize; 2]>) -> Array<A, D>
where
    A: Clone + Zero,
    S: Data<Elem = A>,
    D: Dimension,
{
    assert_eq!(
        arr.ndim(),
        pad_width.len(),
        "Array ndim must match length of `pad_width`."
    );

    // Compute shape of final padded array.
    let mut padded_shape = arr.raw_dim();
    for (ax, (&ax_len, &[pad_lo, pad_hi])) in arr.shape().iter().zip(&pad_width).enumerate() {
        padded_shape[ax] = ax_len + pad_lo + pad_hi;
    }

    let mut padded = Array::zeros(padded_shape);
    {
        // Select portion of padded array that needs to be copied from the
        // original array.
        let mut orig_portion = padded.view_mut();
        for (ax, &[pad_lo, pad_hi]) in pad_width.iter().enumerate() {
            // FIXME: This has a bug when `pad_hi` is 0. See @fzyzcjy's comment below.
            orig_portion
                .slice_axis_inplace(Axis(ax), Slice::from(pad_lo as isize..-(pad_hi as isize)));
        }
        // Copy the data from the original array.
        orig_portion.assign(arr);
    }
    padded
}

fn main() {
    use ndarray::Array2;
    let to_pad = Array2::<f32>::ones((3, 4));
    let padded = pad_with_zeros(&to_pad, vec![[1, 2], [3, 1]]);
    println!("{}", padded);
}

I wouldn't mind including something simple like pad_with_zeros in ndarray. However, numpy.pad has lots of options/variants which make its interface complicated. This makes me hesitant to include a full equivalent of numpy.pad directly in ndarray.

There are a number of functions like this which are useful but don't really fit within the minimal API of ndarray; I would think placing them in another crate would make more sense. That's the reason behind keeping the ndarray-stats crate separate, for example. I do think something like numpy.pad would be valuable in another crate. For example, padding with the wrap variant is tricky to implement but useful in practice, which is the type of functionality well-suited for inclusion in a crate.

Dimev commented 4 years ago

Thanks! I understand this not being included in the main crate, although I find it difficult to implement some things that are one liners in numpy, probably because I'm new to both ndarray and rust. Should I close the issue?

imjasonmiller commented 4 years ago

Thanks @jturner314 for the great example and detailed answer! I agree with your reasoning, it would be much better served in a separate crate.

ZuseZ4 commented 3 years ago

@imjasonmiller How do you feel about you ndarray-pad repo?
You mentioned that you would like to achieve feature-parity with numpy.pad (and maybe publish it as a crate). Is that still accurate? I also implemented basic padding for my convolution layer, but I guess it would be nice to have that in a central space. So having it as a separate crate but linking there from the ndarray docs might be a good solution?

imjasonmiller commented 3 years ago

You mentioned that you would like to achieve feature-parity with numpy.pad (and maybe publish it as a crate). Is that still accurate?

Hi @ZuseZ4! Great to hear that there's some interest! To answer your question: yes, absolutely! I had to deal with some other things, but I'd love to get back to it. I can likely do so next week.

I had converted most of the algorithm that's implemented in numpy.pad on my local repo. From the top of my head, I noticed that having multi-axis slices would be nice-to-have, although implementing padding without it for now would probably be fine—it's just a couple of extra lines of code if I recall correctly.

If you'd like to merge efforts or something akin to that, I'd be open to that as well!

nilgoyette commented 3 years ago

I have ported some np.pad calls in our private codebase, namely reflex, symmetric and wrap. Did you already have those in your local repo?

nilgoyette commented 2 years ago

I just pushed the first version of ndarray-ndimage. It aims to be a Rust port of scipy.ndimage with some other tools.

This is very much a work in progress! For example, I added the pad function but it only supports 3D images and the following modes: reflex, symmetric and wrap. All other functions are missing bits and parts and are not as generic as they should be. This is normal, I only ported what we are using in my company. I'll try to enhance this crate in the next months, so it can be of use to more people.

I'm aware that this can't be called professional work right now (!) but if you think ndarray-ndimage is a good idea and should exist, do not hesitate to contribute!

fzyzcjy commented 2 years ago

The solution provided by @jturner314 has a bug: If any side of padding is zero, it will panic. The code below fixs the problem.

    /// Pad the edges of an array with zeros.
    ///
    /// `pad_width` specifies the length of the padding at the beginning
    /// and end of each axis.
    ///
    /// **Panics** if `arr.ndim() != pad_width.len()`.
    pub fn pad(&self, pad_width: Vec<[usize; 2]>, const_value: A) -> Array<A, D>
    where
        A: Clone,
        S: Data<Elem = A>,
    {
        assert_eq!(
            self.ndim(),
            pad_width.len(),
            "Array ndim must match length of `pad_width`."
        );

        // Compute shape of final padded array.
        let mut padded_shape = self.raw_dim();
        for (ax, (&ax_len, &[pad_lo, pad_hi])) in self.shape().iter().zip(&pad_width).enumerate() {
            padded_shape[ax] = ax_len + pad_lo + pad_hi;
        }

        let mut padded = Array::from_elem(padded_shape, const_value);
        let padded_dim = padded.raw_dim();
        {
            // Select portion of padded array that needs to be copied from the
            // original array.
            let mut orig_portion = padded.view_mut();
            for (ax, &[pad_lo, pad_hi]) in pad_width.iter().enumerate() {
                orig_portion.slice_axis_inplace(
                    Axis(ax),
                    Slice::from(pad_lo as isize..padded_dim[ax] as isize - (pad_hi as isize)),
                );
            }
            // Copy the data from the original array.
            orig_portion.assign(self);
        }
        padded
    }
jturner314 commented 2 years ago

@fzyzcjy Good catch. Thanks for the fix.

fzyzcjy commented 2 years ago

@jturner314 You are welcome!

nilgoyette commented 1 year ago

Closing issue because