julianstanley / gfpopgui

A Shiny-based GUI for GFPOP. Project for Google Summer of Code 2020.
https://julianstanley.shinyapps.io/gfpopgui/
Other
1 stars 1 forks source link

Slow to render hundreds of changepoints #35

Open julianstanley opened 4 years ago

julianstanley commented 4 years ago

@tdhock

(Low priority item--aligns more with Phase 2/3)

I'm pretty happy with the recent enhancements to the analysis page (screenshots below). You can edit the graph now and the data plot is actually useful. Now, I'm going to go back and clean things up and test better.

However, there is one thing that's bothering me. Models with just a few changepoints generate reasonably quickly (variations on the first screenshot below take up to a second or so) --I decoupled the main scatterplot and the overlain changepoints to make that faster.

However, as expected, trying to render hundreds of changepoints is really, really slow and can potentially crash the application. For example, the default std graph with a penalty of 1 with 1000 datapoints (ended up being 336 changepoints) takes on the order of a minute or so.

Should I put a cap on these? I guess I could just show an error if the user tries to render X number of changepoints (or just show a static plot in that scenario)--does that sound like a good choice, or better to just let the users generate however many changepoints they're willing to wait for?

Either way, I'm going to work on making it more efficient--right now I add a new plotly trace/layer for every changepoint, but can probably try and get away with one trace for all of them. But, that might just move the reasonable upper limit from 1,000 to 5,000, etc.--I guess that there's always going to be some point where it's damn slow.

image image image

tdhock commented 4 years ago

hi, there is no reason why it should be slow for 100s of items drawn on the screen. I just made an interactive viz yesterday with 100s of things on screen drawn using svg and it renders fine (about 1sec) on my 10 year old laptop, http://members.cbio.mines-paristech.fr/~thocking/figure-candidates-interactive/ (to see lots of segments click show selection menus -> up.to.t=100 last.change=99 penalty=0.01) it may start getting slow if you have 10,000+ items on screen in my experience. you should not add a new trace/geom per svg element on screen, that is probably why it is slow. try doing one geom/trace which draws all of the segments, and another geom/trace which draws all of the changepoints.

julianstanley commented 4 years ago

Ahh, yeah @tdhock that was the big bottleneck--thanks! Improved things ~10X or so.

The previous example (~300 changepoints) went from ~1 minute to maybe ~2 seconds when I compacted everything into two traces (one for segments one for changepoints)

~3,000 changepoints still takes a long time (a minute, not sure how much divided between drawing traces and constructing the dataframe they're drawn from), but maybe I can work that down as well.

tdhock commented 4 years ago

some of the slowdown is maybe because of plotly (maybe post an issue with them to see if there are any work-arounds?) and some is due to inherent limitation with large data and svg. we may think about optimizing rendering for large data later? one thing to keep in mind is that people rarely have 3000+ pixels wide displays (mine is only 1280 pixels wide), so it does not really make sense to display more data/changes than that. for such large data it would make sense to have two linked displays, one with a zoomed out overview, and another with zoom to details in a specific region. something like range selector from dygraphs http://dygraphs.com/gallery/#g/range-selector

On Fri, Jun 12, 2020 at 11:05 AM Julian Stanley notifications@github.com wrote:

Ahh, yeah @tdhock https://github.com/tdhock that was the big bottleneck--thanks! Improved things ~10X or so.

The previous example (~300 changepoints) went from ~1 minute to maybe ~2 seconds when I compacted everything into two traces.

~3,000 changepoints still takes a long time (a minute), but maybe I can work that down as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/julianstanley/gfpop-gui/issues/35#issuecomment-643413947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDX4WRCAPPYQXO2QF6IOTRWJU6HANCNFSM4N4LW3EA .

julianstanley commented 4 years ago

@tdhock sounds good--I think the limitation in the way that I'm drawing the changepoints/segments may be more of a slowdown than the svg thing at this point. I was hesitant to make an issue request about that, so I posted a question yesterday about what I think the slowdown is in a rstudio community post here. There are a couple plotly experts active on there (including Carson), so hopefully, they or someone else will have some tips/insight.

& Two linked displays sounds like a great idea--I'll pull that out into a separate issue.

julianstanley commented 4 years ago

Another big problem here: adding lots of changepoints can basically crash the app for everyone. A few thousand changepoints are fine, over ~5,000 or so can take too long (~30s+). And more than that can stall out the shinyapps.io server.

So there needs to be some limitation on the number of changepoints you can run on the server.

tdhock commented 4 years ago

until you implement the zooming functionality you may as well implement a limit on the number of data points as well...