Closed halhen closed 2 years ago
thank you.
btw.
fromDataFrame
already extract some combinations from the sets. However, by default they are not the distinct ones.c_type
parameters, e.g upsetjs() |>
upsetjs:::fromDataFrame(generate_data(10000), c_type = "distinctIntersections")
should be enough and way faster since it also uses an optimized version of this combination (data frame + distinct)
Thanks for the fromDataFrame(c_type)
tip -- that solved my immediate performance needs! :tulip:
When generating distinct intersections on data with hundreds of thousands of elements, it grinds to a halt. The time seems to be roughly O(n^2), meaning that with double the data execition takes 2^2=4x times as long. With the help of profviz, we find the main source to be a Filter in pushCombination(), which causes a twice nested loop over the elements.
Minimal benchmark on a fairly beefy computer (5950X, 128 GB RAM) on Fedora Linux, R 4.1.3 and upsetjs 1.11.0, git hash 4b375a8e0
Before this PR:
With this PR:
Also, scaling is now closer to O(n) or slightly better. With 10x the data: