Closed ianthomas23 closed 1 year ago
Merging #1214 (25bd257) into main (8092f4d) will increase coverage by
0.31%
. The diff coverage is95.09%
.
@@ Coverage Diff @@
## main #1214 +/- ##
==========================================
+ Coverage 84.52% 84.83% +0.31%
==========================================
Files 35 35
Lines 8369 8561 +192
==========================================
+ Hits 7074 7263 +189
- Misses 1295 1298 +3
Impacted Files | Coverage Δ | |
---|---|---|
datashader/data_libraries/pandas.py | 100.00% <ø> (ø) |
|
datashader/reductions.py | 84.68% <93.15%> (+1.56%) |
:arrow_up: |
datashader/compiler.py | 91.07% <100.00%> (+0.50%) |
:arrow_up: |
datashader/data_libraries/dask.py | 95.23% <100.00%> (+0.07%) |
:arrow_up: |
datashader/data_libraries/dask_xarray.py | 98.95% <100.00%> (ø) |
|
datashader/utils.py | 81.86% <100.00%> (+2.61%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
I may or may not have glazed over when reviewing certain bits. :-)
It hadn't occurred to me to be entertaining. I can try that approach, but it may come at the expense of code accuracy :grin:
I've rebased it on main
to pick up the fast CUDA mutex PR, and added a couple of error messages. There are no functional code changes so I will merge then CI is green.
This fixes the Dask parts of issues #1182 and #1207. CUDA support will follow in a separate PR, this is already too big.
Full list of example reductions this supports on CPU with and without Dask:
first('value')
- returns singlevalue
per pixellast('value')
- dittofirst_n('value', n=3)
- returnsn=3
value
s per pixellast_n('value', n=3)
- dittowhere(first('value'))
- returns single row index per pixelwhere(last('value'))
- dittowhere(first_n('value', n=3))
- returnsn=3
row indexes per pixelwhere(last_n('value', n=3))
- dittowhere(first('value'), 'other')
- returns singleother
per pixelwhere(last('value'), 'other')
- dittowhere(first_n('value', n=3), 'other')
- returnsn=3
other
s per pixelwhere(last_n('value', n=3), 'other')
- dittoSummary of changes:
Some
reduction
behaviour is now dependent on whether supplied data is partitioned (dask or CUDA) or not, so extra argumentscuda
andpartitioned
are passed through the class hierarchy.I have factored out some common
first
andlast
code into a private base class_first_or_last
to simplify, and similar applies to_first_n_or_last_n
.The fundamental addition is that
first('value')
using Dask is implemented aswhere(_min_row_index(), 'value')
. This takes advantage of the existingwhere
machinery and a new class_min_row_index
which is very similar to amin
reduction except that it accepts a virtual row index column, not a real column supplied by the user. I could have extended themin
class to cover this, but I thought it better to keep the new functionality in a new private class. It is possible to instantiate the private class of course, although no user code should do this, and I have done this in the test suite. There are equivalent new private classes forlast
,first_n
andlast_n
.I do need to recheck that trying to use CUDA with these reductions raises informative error messages not crashes.