Closed ianthomas23 closed 1 year ago
Merging #1196 (8f9ec81) into main (ed0d58e) will decrease coverage by
0.80%
. The diff coverage is32.00%
.
@@ Coverage Diff @@
## main #1196 +/- ##
==========================================
- Coverage 85.48% 84.68% -0.80%
==========================================
Files 35 35
Lines 8232 8345 +113
==========================================
+ Hits 7037 7067 +30
- Misses 1195 1278 +83
Impacted Files | Coverage Δ | |
---|---|---|
datashader/transfer_functions/_cuda_utils.py | 23.52% <17.77%> (-2.85%) |
:arrow_down: |
datashader/reductions.py | 83.11% <30.64%> (-3.06%) |
:arrow_down: |
datashader/compiler.py | 92.81% <72.22%> (-2.94%) |
:arrow_down: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Test failures are related to rioxarray
which released 0.14.0
yesterday.
Test failures are related to
rioxarray
which released0.14.0
yesterday.
This was a dependency issue in the conda-forge
rioxarray
build which has now been fixed: https://github.com/conda-forge/rioxarray-feedstock/pull/70.
Nice. Thanks!
Closes #1177.
This adds support for
max_n
andmin_n
reductions on a GPU, both with and withoutdask
. The key change is to add new CUDA mutex functionality to support CUDAappend
functions (i.e. individual pixel callbacks) that do more than a simple get/set operation. Because of the massively parallel nature of CUDA hardware multiple threads can access the samecanvas
pixel at the same time, and up until now we have been restricted to CUDA atomic operations (https://numba.readthedocs.io/en/stable/cuda/intrinsics.html#supported-atomic-operations) inappend
functions. With the new mutex we can lock access to a particular pixel to a single thread at a time and thus perform more complicated operations such as formax_n
without any race conditions.In implementation we need to get the mutex (a
cupy
array) to the CUDAappend
functions and this is achieved within theexpand_aggs_and_cols
framework by appending the mutex array in themake_info
function which is where other arrays and/or dataframe columns are extracted and passed toappend
functions. This ensures that there is only ever a single shared mutex even if multiple reductions need it.This implementation is limited by what is currently available in
numba
0.56 which means we can only lock/unlock the mutex as a whole rather than individual elements/pixels of it so the performance will not be great. Numba PR https://github.com/numba/numba/pull/8790 will allow us to lock individual pixels, so whennumba
0.57 is released I will write another PR to check use the fast route if that is available otherwise drop back to this slower one.There is no support yet for
where(max_n)
on CUDA, but this will follow in another PR soon.