Do maximum_filter with cupy instead of scipy

https://github.com/worldveil/dejavu/blob/d2b8761eb39f8e2479503f936e9f9948addea8ea/dejavu/fingerprint.py#L98

Replace the above codes with following codes.

import cupy as cp
from cupyx.scipy.ndimage import maximum_filter as cp_maximum_filter
array = cp.array(arr2D)
local_max = cp.asnumpy(cp_maximum_filter(array, footprint=cp.array(neighborhood)) == array)
del array

Single Channel length	Audio Duration	cupy	scipy
62622441	24 min	45.92s	49.69s
31311220	12 min	10.6s	25.68s
15655610	6 min	1.56s	12.62s
7827805	3 min	0.72s	6.18s
3913902	1.5 min	0.34s	3.09s

Environment: Ryzen 7 5800H, RTX3060 6G Mobile, Windows 10 21H2, Python 3.7.9, CUDA 11.0, cupy-cuda110

Which means that cupy will save up to 10 seconds when processing a dual-channel audio for 3 minutes. Meanwhile it will cost up to 1.5GB of video memory for an audio for 24 minutes. https://cupy.dev/

worldveil / dejavu

Do maximum_filter with cupy instead of scipy #285