tidyverse / duckplyr

A drop-in replacement for dplyr, powered by DuckDB for performance.
https://duckplyr.tidyverse.org/
Other
288 stars 19 forks source link

feat: `mutate()` constructs intermediate data frames for each new variable #332

Closed krlmlr closed 1 week ago

krlmlr commented 1 week ago

For #270.

@toppyy: I believe this is the only instance of rel_translate() that gets called without data . This will simplify your PR. Appreciate your thoughts.

github-actions[bot] commented 1 week ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f03e52e7a6ad5d09e03cc3f07029402d783032a7 is merged into main:

Further explanation regarding interpretation and methodology can be found in the documentation.

toppyy commented 1 week ago

Thanks.

So if I'm understanding correctly, now that there are intermediate data frames we could pass them as arguments like this rel_translate(quo, data = current_data, etc..)? This would allow us to 1) make comparison expressions also within mutate() and 2) skip checking if datais missing as rel_translate does not get called without it.

A sidenote: I explored the above approach and found that an unnecessary transformation within to_duckdb_expr leads to an error if the expr has an alias. I removed the transformation in the PR.