pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem
https://sparse.pydata.org
BSD 3-Clause "New" or "Revised" License
604 stars 127 forks source link

Enh: Support for sparse nd-convolutions #762

Open adonath opened 2 months ago

adonath commented 2 months ago

Please describe the purpose of the new feature or describe the problem to solve.

I would be great to have support for ND convolution of sparse arrays. Basically an implementation of scipy.signal.convolve, with support for sparse arrays.

Suggest a solution if possible.

I have played around a bit with implementing a convolution via sparse matrix multiplication:

import sparse as sp
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import convolve2d
from scipy.linalg import circulant

random_state = np.random.RandomState(7236)

coords = random_state.randint(0, 100, size=(2, 100))
data = random_state.gamma(10, size=100)
data_sparse = sp.COO(coords, data, shape=(100, 100))

def gaussian_kernel(sigma, size=None):
    """Get 2d Gaussian kernel"""
    if size is None:
        size = int(3 * sigma) * 2 + 1

    x = np.linspace(-size, size, 2*size+1)
    y = np.linspace(-size, size, 2*size+1)
    x, y = np.meshgrid(x, y)
    kernel = np.exp(-(x**2 + y**2) / (2 * sigma**2))
    return kernel / kernel.sum()

def convolve2d_sparse(data, kernel):
    """Convolve sparse 2d data with 2d kernel"""
    shape_diff = np.array(data.shape) - np.array(kernel.shape)
    kernel_pad = np.pad(kernel, pad_width=((shape_diff[0], 0), (shape_diff[1], 0)), mode='constant', constant_values=0)

    kernel_matrix = sp.COO.from_numpy(circulant(kernel_pad))
    result =  (kernel_matrix @ data.flatten()).reshape(data.shape)
    return sp.roll(result, shift=(kernel.shape[0]//2, kernel.shape[1]//2 + 1), axis=(0, 1))

kernel = gaussian_kernel(1.5)

result_sparse = convolve2d_sparse(data_sparse, kernel)

result_dense = convolve2d(data_sparse.todense(), kernel, mode='same', boundary='wrap')

fig, axes = plt.subplots(1, 3, figsize=(12, 4))

diff = result_dense - result_sparse.todense()

axes[0].imshow(result_sparse.todense(), cmap='viridis')
axes[0].set_title('Sparse convolution')
axes[1].imshow(result_dense, cmap='viridis')
axes[1].set_title('Dense convolution')
axes[2].imshow(diff, vmin=-0.5, vmax=0.5, cmap="RdBu")
axes[2].set_title('Diff')

image

But performance wise this is really bad:

image

Also boundary handling becomes really complex. And I am sure the experts have much better ideas on how to implement this.

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

hameerabbasi commented 2 months ago

Hi! We are indeed considering sparse convolution; but it will probably be part of the new backend, see the announcement at #618.