jolars / eulerr

Area-Proportional Euler and Venn Diagrams with Ellipses
https://jolars.github.io/eulerr/
GNU General Public License v3.0
129 stars 18 forks source link

very slow, Doesn't work with large datasets #89

Open adminy opened 2 years ago

adminy commented 2 years ago
library(eulerr)

fit <- euler(c( "A" = 55, "B" = 810, "C" = 102, "D" = 364, "E" = 101, "F" = 24,
            "G" = 34, "H" = 61, "I" = 194, "J" = 107, "K" = 53, "L" = 75, "M" = 11,
            "N" = 65, "O" = 16, "P" = 82, "Q" = 13, "F&O" = 5, "G&O" = 5, "F&G" = 5,
            "D&I" = 47, "D&E" = 33, "K&L" = 9, "A&K" = 7, "K&N" = 7, "A&N" = 7, "A&L" = 7,
            "K&P" = 7, "A&P" = 7, "L&N" = 7, "N&P" = 7, "C&K" = 7, "A&C" = 7, "L&P" = 7,
            "E&G" = 6, "J&K" = 7, "A&J" = 7, "C&O" = 5, "G&H" = 4, "C&N" = 7, "J&N" = 7,
            "E&F" = 5, "C&F" = 5, "C&L" = 7, "J&L" = 7, "C&P" = 7, "J&P" = 7, "C&G" = 5,
            "D&K" = 15, "C&J" = 7, "A&D" = 12, "D&L" = 12, "E&O" = 3, "E&H" = 4, "C&I" = 6,
            "C&D" = 8, "D&H" = 7, "D&N" = 7, "D&P" = 7, "D&J" = 7, "C&E" = 3, "H&O" = 1,
            "B&D" = 15, "B&C" = 11, "F&H" = 1, "E&M" = 1, "B&E" = 8, "B&K" = 7, "I&K" = 2,
            "A&B" = 7, "B&N" = 7, "B&L" = 7, "B&P" = 7, "D&F" = 3, "B&J" = 7, "I&L" = 2,
            "C&H" = 1, "D&O" = 1, "D&G" = 1),
        shape = "ellipse")

# ^^^^^^^ takes forever
fit$stress
fit$diagError

plot(fit)

Meanwhile the browser Venn.js is instant, just less accurate.

Can there be some technique applied to speed it up or at least specify a stopping point for accuracy?

Thanks

jolars commented 2 years ago

Hm, yes I agree that there should be an option to set tolerance and maximum number of iterations.

It looks like an easy fix too. I'm happy to consider pull requests if you or anyone else wants to contribute.

adminy commented 2 years ago

I was going to parametrise it but I noticed my N is 131071. its stuck at getting stats:

stats::nlm(f = optim_final_loss,
                                 p = pars,
                                 areas = areas_disjoint,
                                 circle = circle,
                                 iterlim = 1e6)$estimate

Even with itterlim to 1 its still taking hours.

Either the algorithm is slow or there is a bug. Also I don't see how my N is so large with just 28 nodes.

jolars commented 2 years ago

28 is not a small number, and I'm not actually sure that it makes sense to even try to use an Euler diagram here since the output is bound to be widely misleading. Finding good approximate Euler diagrams is hard even at 4 sets.

The problem is that eulerr tries to compute the areas of the 2^28 - 1 = 268435455 possible intersections in the diagram, which is obviously not going to end well.

It sounds like venn.js is doing something differently since it's as you say much faster, but I'm not really sure how it's possible to avoid considering all the intersections.

jolars commented 2 years ago

Or, hm, actually it should absolutely be possible to alleviate this problem by ruling out some of the possible intersections, i.e. if A is not intersecting with C, then obviously A&C&whatever will also be empty, so yeah, this should be possible to fix (at least partially).

adminy commented 2 years ago

Agreed. Its not the amount of nodes that matter, its the overlaps.

I was thinking to do something like start off with just one overlap and keep increasing the overlaps until I find the outlier overlap that's causing the algorithm to stress over finding a solution.

csijcs commented 2 years ago

I'm having a similar issue, any solution?