Closed ianthomas23 closed 1 year ago
Thanks! Can you clarify the current status of types? I.e. can you return an integer aggregate when testing on a float condition?
where
always returns a float64
with nan
s to represent no data, just as min
, max
, first
, last
etc reductions.
Ok, I guess we'll need to deal with datatype issues when we support using the Pandas index as the "column" (actually just imputed values that act like a column, hence needing special support).
Rebased on top of main to pick up the CI fixes.
The reduction in coverage is mostly due to changes to the CUDA append
functions and such CUDA code is not run in github actions.
Merging #1155 (b34ffd6) into main (645ae07) will increase coverage by
0.03%
. The diff coverage is83.68%
.
@@ Coverage Diff @@
## main #1155 +/- ##
==========================================
+ Coverage 85.39% 85.43% +0.03%
==========================================
Files 35 35
Lines 7819 7941 +122
==========================================
+ Hits 6677 6784 +107
- Misses 1142 1157 +15
Impacted Files | Coverage Δ | |
---|---|---|
datashader/core.py | 88.05% <ø> (ø) |
|
datashader/reductions.py | 86.94% <80.83%> (-0.29%) |
:arrow_down: |
datashader/compiler.py | 95.62% <100.00%> (+0.53%) |
:arrow_up: |
datashader/glyphs/line.py | 92.95% <0.00%> (+0.09%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Pinging @jbednar. I'd like to merge this and add the extra functionality (such as use of a virtual integer row index) as separate PRs.
This partially implements issue #1126, adding a new
where
reduction that accepts either amax
ormin
reduction. Best illustrated via an example:which outputs
You can think of this using the
max('value')
reduction as normal, but then returning the corresponding values from the'other'
column rather that thevalue
column.What it currently supports:
where
takes either amin
ormax
selector reduction.summary
or categoricalby
reduction.Note that there is no support for use of
first
andlast
within awhere
because there is no advantage in doing this, you can just use thefirst
orlast
directly on their own.Future improvements:
lookup_column
is not specified, use the index of the row in the supplied DataFrame.max_n
,min_n
,first_n
,last_n
reductions.All of these are possible but fiddly to implement, so I would rather have partial functionality available for users to experiment with and I can add these improvements over time.
Currently some combinations of lines and dask give different results depending on the number of dask partitions, but this has always been the situation and is no worse here.