cupy as alternative to numpy on critical sections

NilsKrause commented 4 years ago

So, i looked a bit around online and stumbled across cupy. A library that basicly wraps numpy functionality in a library that runs on the gpu to perform highly concurrent calculations faster.

I tinkered a bit around but didn't really got to a state to test it effectivly. Mainly because I am an absolute python scrub and also have no clue about image computation whatsoever. But I am hoping that someone else can implement it into the code, just to see if it gives any performance upgrade on larger images.

Currently it's not that trivial to setup an environment for it, but I got it running on my Arch Linux with a GeForce 1050 Ti. cupy GitHub page cupy installation instructions

NilsKrause commented 4 years ago

I got an implementation with cupy working, but it turned out to be slower than the actual numpy implementation. But I am in the dark as to why exactly it is slower.

So basicly I rewrote the pyxelate._reduce function to work with cupy as follows:

def _reduce(self, image):
    """Apply convolutions on image ITER times and generate a smaller image
    based on the highest magnitude of gradients"""

    # self is visible to decorated function
    @adapt_rgb(each_channel)
    def _wrapper(dim):
        # apply median filter for noise reduction
        dim = median(dim, square(4))
        for _ in range(self.ITER):
            h, w = dim.shape
            h, w = h // 2, w // 2
            new_image = cp.zeros((h * w)).astype("int")
            view = view_as_blocks(dim, (2, 2))

            flatten = cp.asarray(view).reshape(-1, 2, 2)

            # bottleneck
            for i, f in enumerate(flatten):
                conv = cp.abs(cp.sum(cp.multiply(self.CONVOLUTIONS, f.reshape(-1, 2, 2)).reshape(-1, 4), axis=1))
                new_image[i] = cp.mean(f[self.SOLUTIONS[cp.argmax(conv)]])

            new_image = new_image.reshape((h, w))
            dim = cp.asnumpy(new_image.copy())

        return new_image

I also hat to define the CONVOLUTIONS and SOLUTIONS array as cupy.ndarray.

My guess as to why it's slower is because eigther there weren't enougth operations to actually make up for the time it took to copy the array to the gpu. Or because the operations need to be optimized in some way to actually make use of the gpu. (I tested the speed with multiple smaller images and one very large image, in both scenarios the cupy variant was significant slower)

Currently I am trying to wrap my head around what the function actually does so that I could rewrite it to use the gpu more effective. But it's probably gonna take a while until I understand everything, as it's my first time working with convolutional filters and skilearn/skimage as for that matter.

Any help and infos to that matter are appreciated.

sedthh commented 4 years ago

Thank you for your help! Let me know if you figure out anything.

If there was a way to bring for i, f in enumerate(flatten): to cupy and there would be a huge speed boost. Right now it still has to go back to python, do an iteration and create overhead through the function call to either NumPy or CuPy.

Still, this is great news, I will look into the library as well!

sedthh / pyxelate

cupy as alternative to numpy on critical sections #11