Support max_n and min_n reductions on GPU

ianthomas23 commented 1 year ago

Closes #1177.

This adds support for max_n and min_n reductions on a GPU, both with and without dask. The key change is to add new CUDA mutex functionality to support CUDA append functions (i.e. individual pixel callbacks) that do more than a simple get/set operation. Because of the massively parallel nature of CUDA hardware multiple threads can access the same canvas pixel at the same time, and up until now we have been restricted to CUDA atomic operations (https://numba.readthedocs.io/en/stable/cuda/intrinsics.html#supported-atomic-operations) in append functions. With the new mutex we can lock access to a particular pixel to a single thread at a time and thus perform more complicated operations such as for max_n without any race conditions.

In implementation we need to get the mutex (a cupy array) to the CUDA append functions and this is achieved within the expand_aggs_and_cols framework by appending the mutex array in the make_info function which is where other arrays and/or dataframe columns are extracted and passed to append functions. This ensures that there is only ever a single shared mutex even if multiple reductions need it.

This implementation is limited by what is currently available in numba 0.56 which means we can only lock/unlock the mutex as a whole rather than individual elements/pixels of it so the performance will not be great. Numba PR https://github.com/numba/numba/pull/8790 will allow us to lock individual pixels, so when numba 0.57 is released I will write another PR to check use the fast route if that is available otherwise drop back to this slower one.

There is no support yet for where(max_n) on CUDA, but this will follow in another PR soon.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1196 (8f9ec81) into main (ed0d58e) will decrease coverage by 0.80%. The diff coverage is 32.00%.

@@            Coverage Diff             @@
##             main    #1196      +/-   ##
==========================================
- Coverage   85.48%   84.68%   -0.80%     
==========================================
  Files          35       35              
  Lines        8232     8345     +113     
==========================================
+ Hits         7037     7067      +30     
- Misses       1195     1278      +83

Impacted Files	Coverage Δ
datashader/transfer_functions/_cuda_utils.py	`23.52% <17.77%> (-2.85%)`	:arrow_down:
datashader/reductions.py	`83.11% <30.64%> (-3.06%)`	:arrow_down:
datashader/compiler.py	`92.81% <72.22%> (-2.94%)`	:arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ianthomas23 commented 1 year ago

Test failures are related to rioxarray which released 0.14.0 yesterday.

ianthomas23 commented 1 year ago

Test failures are related to rioxarray which released 0.14.0 yesterday.

This was a dependency issue in the conda-forge rioxarray build which has now been fixed: https://github.com/conda-forge/rioxarray-feedstock/pull/70.

jbednar commented 1 year ago

Nice. Thanks!

holoviz / datashader

Support max_n and min_n reductions on GPU #1196

Codecov Report