[FEA] Multi-GPU support for cupyimg

rapidsai / cucim

cuCIM - RAPIDS GPU-accelerated image processing library

https://docs.rapids.ai/api/cucim/stable/

Apache License 2.0

340 stars 58 forks source link

[FEA] Multi-GPU support for cupyimg #10

Open b2jia opened 3 years ago

b2jia commented 3 years ago

This is a question migrated over from cupyimg: will there be multi-GPU support for GPU-enabled scikit-image functions?

phase_cross_correlation and affine_transform are two functions I have in mind. These are probably ubiquitously used functions in biomedical imaging for image registration.

Problem is, for certain problems where image resolution is key and down-sampling is not possible (ie. single-molecule spatial transcriptomics), there's a need to register large 3D volumes that cannot be loaded into one GPU alone. Being able to "chunk" the image not unlike dask would be extremely helpful. I realize this is easier said than done!

I'm actually very keen on learning how to implement this, for personal growth and for to benefit the community - but need some guidance. Would the authors be able to comment on this utility?

jakirkham commented 3 years ago

Dask-image has an affine_transform, which uses CuPy under-the-hood. So maybe that can be used as-is.

There isn't currently a phase_cross_correlation implementation for multiple GPUs.

If the problem can be solved in smaller chunks, using map_blocks or map_overlap may make sense. This can then be followed by some computation on the aggregate result. For example one might identify shifts needed for smaller chunks of the image and then average these over the whole image or similar. This could give a rough answer that may be good enough.

Alternatively Dask has its own FFTs and Dask-image has fourier_shift. So it might be possible to roll your own registration algorithm using these techniques.

b2jia commented 1 year ago

@gigony (and perhaps @caryr35 ) I'm following up maybe too late. Happy to see phase_cross_correlation with GPU support, and I gave cucim another try. The results of cucim.skimage.registration.phase_cross_correlation often do not match that computed on CPU (skimage.registration.phase_cross_correlation. Is there a reason why?

CUCIM version: 23.02.00 skimage version: 0.18.3

from cucim.skimage.registration import phase_cross_correlation as pcc_gpu
from skimage.registration import phase_cross_correlation as pcc_cpu

drift_gpu = pcc_gpu(cp.asarray(img_stack_v1[:, :512, :512]), 
                    cp.asarray(img_stack_v2[:, :512, :512]),
                    upsample_factor=100)
drift_cpu = pcc_cpu(img_stack_v1[:, :512, :512], 
                    img_stack_v2[:, :512, :512],
                    upsample_factor=100)
print("Drift gpu", drift_gpu[0])
print("Drift cpu", drift_cpu[0])
>>>Drift gpu (0.85, 0.5, -0.11)
>>>Drift cpu [ 0.03  0.21 -0.04]

jakirkham commented 1 year ago

cc @grlee77 (who may know more)