nathaneastwood / poorman

A poor man's dependency free grammar of data manipulation
https://nathaneastwood.github.io/poorman/
Other
338 stars 15 forks source link

Update `pivot_` functions for performance #117

Open etiennebacher opened 1 year ago

etiennebacher commented 1 year ago

Hi @nathaneastwood, I rewrote the pivot_ functions in {datawizard} to use stack() and unstack() instead of reshape(), as suggested by @grantmcdermott in #48. This comes with important performance gains, especially with large datasets (a few million rows).

All code and benchmarks are in this PR: https://github.com/easystats/datawizard/pull/285

I will probably make a PR here to implement this but I open this issue first just in case I forget about this and someone else wants to do it.


Edit: there were several fixes to make in the original implementation in the PR I linked to. It's better to rely on the functions in the main branch of datawizard rather than on the code in the PR.

nathaneastwood commented 1 year ago

The performance improvements look really great @etiennebacher. I'd definitely appreciate a PR.