rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
269 stars 37 forks source link

Add a `groupby` function #591

Closed rafaqz closed 6 months ago

rafaqz commented 7 months ago

Add a groupby function (will do aggregate later).

@lazarusA

@bkamins is there a way we can avoid clashes with DataFrames.jl ? could groupby go in DataAPI.jl?

Closes #3

bkamins commented 7 months ago

I am OK to put groupby in DataAPI.jl requiring that the first positional argument must have a type constraint to a type defined in a package that adds a method to groupby. @nalimilan - OK?

codecov-commenter commented 7 months ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (27c6d0e) 86.81% compared to head (89f1b5a) 87.24%.

:exclamation: Current head 89f1b5a differs from pull request most recent head 1b5f31c. Consider uploading reports for the commit 1b5f31c to get more accurate results

Files Patch % Lines
src/groupby.jl 88.88% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #591 +/- ## ========================================== + Coverage 86.81% 87.24% +0.42% ========================================== Files 44 44 Lines 3687 3473 -214 ========================================== - Hits 3201 3030 -171 + Misses 486 443 -43 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rafaqz commented 7 months ago

Thanks, first argument owned by the package is perfect.

lazarusA commented 6 months ago

what's the plan for aggregate? it will be a groupby operation as well right? Latest commit shows a mean over monthly aggregates (groupby month 😄 ).

rafaqz commented 6 months ago

I think we will still have aggregate for doing things like aggregate 3 cells to one. It was just too much to get it all right and blocking merging this one, so I separated it out

(maybe groupby can do the int aggregation too, but I think aggregate can be faster and simpler than having to think about split-apply-combine... we also need combine here too)

rafaqz commented 6 months ago

@meggart do you want to look at the groupby syntax/implementation and give it the ok for later DiskArrays.jl integration? Just the groupby.jl src/test/docs are worth a loook