nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
291 stars 162 forks source link

Improving Visualization Performance #968

Open benlesh opened 4 years ago

benlesh commented 4 years ago

I just spoke with @colinmegill about what could be done to improve the performance of the visualizations.

Small wins:

Don't render any visualization that is not on the screen

There are a variety of ways to do this, IntersectionObserver is a solid tool for that, and it should be an easy win. It exists in every browser but IE. (I highly recommend ditching support for IE.. if we're talking about working on something that will help save lives, we shouldn't slow down progress for a browser that is dead in a year) If IE must be supported, then we'd gate the functionality on the existence of IntersectionObserver.

Bigger Wins

Render all "circles" in the tree view to Canvas overlay

Render all circles in the map view as a Leaflet canvas overlay

There are a number of examples around of how to do this. The first one I found was here. The same general idea as above. It might be that you want to do more than one overlay. one fo the circles, and one for the bezier arcs, so as you're brushing the number of updates required are minimal.

Other thoughts

It's going to become important to abstract the location/positioning data away from any component they may currently be residing in now. This is so it can be reused effectively by newer code as you transition from SVG to Canvas. The upside is that the coordinate system between 2d canvas and SVG is basically the same, X,Y coords originating from the top left.

It's also worth mentioning that the calculations are very unlikely to be the bottleneck to good performance here. So it's probably best not to waste too much effort on those. Computers are pretty fast at dealing with sets of numbers and calculations, even in JavaScript. At the moment that does become an issue we can address it.. move it to a worker, WASM, etc. But for now, the biggest choke point is the DOM rendering.

Actions to be taken

Possibly related #955

benlesh commented 4 years ago

Looking more deeply into this, given that the lines for the tree also have a hover callback, we might not be able to do this in stages. It might be more beneficial, and faster, to develop a second component that we can swap in place. (Ideally something that could even be separately open sourced at some point).

jameshadfield commented 4 years ago

Thanks @benlesh -- really exciting.

Re: canvas. We've liked the functionality d3 gives us here (events, transitions etc), but I agree that it's time to seriously consider moving away from SVG. I haven't done stuff in canvas since the early days where we used the low-level API & computed intersects ourselves, but I know there are really good libraries now. Could you talk more / point me to a simple example of what things would look like to e.g. have a hover-event attached to a tip, transition that tip etc? I know @colinmegill is a fan of regl -- would you suggest using this?

Stepping back a bit, we currently have a (large-ish) piece of code -- here and here -- which essentially works out what redux actions are incoming and generates the necessary d3 (imperative) commands. This is largely needed due to the problems that arise if you make a d3 call while a previous d3 call is running / transitioning (not fun to debug). Perhaps this can be completely avoided by just re-rendering the entire canvas (which we couldn't do in SVG, too slow).

It might be more beneficial, and faster, to develop a second component that we can swap in place.

Yes, I agree this is probably simpler. Perhaps doing the map first would be simpler, since there aren't any events yet (although there's a PR in which adds them for hover)? But on the other hand the tree is the performance bottleneck.

I highly recommend ditching support for IE.. if we're talking about working on something that will help save lives, we shouldn't slow down progress for a browser that is dead in a year

The CDC still uses IE (I've put out an internal call to confirm and get the actual version for you, I believe it's 11) and so it's important to keep supporting IE.

benlesh commented 4 years ago

It also seems that the entropy calculation is expensive, and may be being done too often.

ace-n commented 4 years ago

Chiming in with my own analysis (less as an objective answer and more as a separate viewpoint) of the Rectangular layout (that may also apply to other layouts, too):

Branches vs. Tips

It seems like a significant amount of render time (~6x more) is spent on rendering the branches vs. the tips.

I agree with @benlesh's assertion that ideally, we'd make the position calculation logic more abstract - but if we want a quick "lift-and-shift" win (particularly if time is of the essence due to the COVID-19 crisis, or if you plan to replace this viz entirely later on), we can probably get one by optimizing the branch rendering.

This leads me to my next point: WebGL.

WebGL and gradients

I did some tinkering with PixiJS and WebGL over the weekend. My approach here was to layer a transparent Canvas element atop the existing SVG one and (via CSS) disable mouse interaction.

(I kept the existing tips because a) they handle position calculation already, and b) they weren't the lowest-hanging fruit performance-wise.)

Drawing solid color lines with PixiJS on this canvas is pretty darned fast (~40ms for all of drawBranches vs. ~300 ms now on the ~2k point COVID-19 dataset), but gradients are another matter (and slower, in my very-much-a-noob implementations - the attempts I've taken require either resizing or creating a bunch of different textures [for each different gradient color/size], which I suspect are quite computationally expensive).

We could also possibly implement a rudimentary gradient alternative - e.g: 1) the vertical branch is set to the parent node's branchStroke value, and the horizontal branch is set to the child node's branchStroke value 2) split each line into multiple solid-color lines with intermediate colors 3) use gradients for thicker and/or longer lines, and solid colors for the rest

Conclusions

TL;DR

I think we could get a solid performance win by replacing {SVG, gradient} branches with {WebGL, solid-color} ones - but I'm not sure how important the gradient feature is.

Alternatively...

If gradients are important, then we're probably better off trying to improve the existing SVG code first. I reckon we could get a ~20% speedup on a ~2k point dataset by aggressively optimizing the existing SVG logic.