Closed vlahm closed 4 years ago
Thanks. A decrease in efficiency within dplyr
is expected. But the behaviour shown in your first example should definitely not happen. I have no idea why, but I suspect it has something to do with environment creation. I'll investigate.
It turns out finalizers are too expensive. So I made some internal performance improvements and now I see much better timings. Could you install the current version from GitHub and confirm this, please?
That did the trick. Thank you!
Great! I'll roll out an update to CRAN.
On CRAN now.
Observed on Ubuntu 18.04 and Windows 10 with R 3.6.3 and errors 0.3.4.
Using dplyr's group_by-summarize construct, this same issue arises, and processing efficiency decreases dramatically, making e.g. averaging of duplicate values (and their error) over a 10e7 data frame temporally impossible. Note that the length of
x
has been reduced from 500 to 50 for the following example, but the fastest runtimes are actually slower than in the above example.Created on 2020-08-21 by the reprex package (v0.3.0)