pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.07k stars 294 forks source link

CUDA support possible? #2461

Open xiaoqianWX opened 1 year ago

xiaoqianWX commented 1 year ago

Feature Request

Describe the solution you'd like As we get more and more higher resolution satellites, processing speed becomes a concern. Satpy is already using Dask for parallel processing, but CPU has it's own limits. Developing CUDA support is probably the easier & more effective way of improving speed. Please consider this as an option for future development, thank you very much!

Describe any changes to existing user workflow Are there any backwards compatibility concerns? Changes to the build process? Additional dependencies? CuPy? Possible issues with compatibility

Additional context Have you considered any alternative solutions or is there anything else that would help describe your request. Satpy is already using Dask, but nothing with GPU acceleration would work because Satpy itself doesn't support any GPU acceleration.

djhoese commented 1 year ago

Thank you for bringing up this topic for discussion. As with anything we have to strongly consider the time cost and benefit of adding builtin algorithm support for CUDA/CuPy. There is theoretically no reason (from the little experience I have with mixing cupy + dask + numpy) that you couldn't convert our numpy-based dask arrays into cupy-based dask arrays and have some parts of Satpy "just work" due to the basic numpy operations being performed. There are 3 important parts where this doesn't apply though:

  1. Numpy functionality not implemented in CuPy. It is my understanding that for complete compatibility with numpy operations CuPy has to implement support for each existing numpy function that we might use. There may be cases where we use functions that aren't implemented.
  2. Resampling or other low-level algorithms: These algorithms would have to be completely rewritten to really take advantage of the GPU. Many of them are written in Cython. Additionally, some of these algorithms aren't even actually that slow, but could maybe benefit from better/smarter use of dask. Some of the newer resampling algorithms being worked on like "gradient_search" can do nearest neighbor and bilinear resampling for gridded datasets (hopefully swath in the future) very quickly.
  3. Input/Output: This is my main concern. I don't have any numbers to back this up, but it is my hunch that much of Satpy's processing time is spent reading from data files or writing to output image or data files. Using CuPy/CUDA on a GPU isn't going to help this at all and in some cases could even make it worse as we have to shuffle data from GPU memory to CPU memory and then to disk. There isn't much we can do here. Similarly, there are some operations that just take time to shuffle data around like masking all bands of an RGB image with a mask generated from the OR of each channels invalid pixels. Moving this to a GPU will just add time due to needing to transfer data in and out of the GPU.

So I (we - I'll speak for all Satpy maintainers) am not against implementing more GPU code except for where it makes it harder to install or use Satpy for the user. But I worry that we'd see very little benefit except for in some small cases. If you @xiaoqianWX or anyone else has ideas where you know something is slow on the CPU but would be much faster on the GPU please comment here.

xiaoqianWX commented 1 year ago

@djhoese I agree, CuPy isn't really capable of covering what NumPy can do, so switching to that really isn't the best idea. Ultimately it comes down to the CPU heavy low level algorithms like resampling, when done on a very large scale, is where most of the pain with processing time comes from.

For IO, I think that in most cases if you have your full PCIe 16x, unless your dataset is tiny(like processing 10 numbers in an array), it shouldn't really be the bottleneck. Next-gen satellite data is just gonna get bigger in size, take the GF-5A as an example, with 1500km swath and 100m IR resolution, the real bottleneck is likely to be on downloading the data instead of processing time.

We've been tuning Dask a lot recently to reduce RAM usage and load, but I think we are close to as far as it can get. Haven't experimented a lot with other resampling algorithms though, I really should try that out.

After all, CUDA(and the libraries for it) is supposed to make working with a lot of numbers faster, although it's not really been tested on processing satellite data. I'm really not good at working with CUDA, or even a good programmer to begin with, if someone else is also interested in this, please drop your thoughts here, thanks.

Last, thank you to Pytroll & Satpy contributors for maintaining this amazing project.