Closed monopolynomial closed 3 years ago
If we allow splitting the operation into two steps then this should be faster (and have a correct output structure, as the proposed one has two extra columns):
select!(combine(groupby(x, :id3), :v1 => maximum∘skipmissing => :v1, :v2 => maximum∘skipmissing => :v2),
:id3, [:v1, :v2] => ((v1, v2) -> v1 - v2) => :range_v1_v2)
However, the question is if we want to allow for this as also other solutions would probably benefit from a similar change.
Yes, for data.table that would be big improvement. The goal of question 7 in groupby is to stress complex expression by group so decomposing that into simple expression is not desirable. pandas, dask and polars (fyi @ritchie46) are currently using simple expressions, that should be amended, whenever possible. Thanks for bringing that up. I think we can close this PR and I will fill the issue about adjusting mentioned solutions.
similar to
pandas
code for doing the task it's faster than the current implementation.