Improving Visualization Performance

I just spoke with @colinmegill about what could be done to improve the performance of the visualizations.

Small wins:

Don't render any visualization that is not on the screen

There are a variety of ways to do this, IntersectionObserver is a solid tool for that, and it should be an easy win. It exists in every browser but IE. (I highly recommend ditching support for IE.. if we're talking about working on something that will help save lives, we shouldn't slow down progress for a browser that is dead in a year) If IE must be supported, then we'd gate the functionality on the existence of IntersectionObserver.

Bigger Wins

Render all "circles" in the tree view to Canvas overlay

Use a transparent canvas to render, just the circles alone, they move on their own between views, there are a LOT of them, and Canvas can easily render these circles quickly and efficientely.
For user interactions, you would use a transparent <div> overlay that intercepts all user interactions to check to see if the mousemove or click, etc events are "over" one of the circles. This can be done via simple and efficient coordinate lookup with a library called RBush.
The brush overlay would be another layer, so as to prevent the re-rendering of the circles underneath during brushing, helping performance.

Render all circles in the map view as a Leaflet canvas overlay

There are a number of examples around of how to do this. The first one I found was here. The same general idea as above. It might be that you want to do more than one overlay. one fo the circles, and one for the bezier arcs, so as you're brushing the number of updates required are minimal.

Other thoughts

It's going to become important to abstract the location/positioning data away from any component they may currently be residing in now. This is so it can be reused effectively by newer code as you transition from SVG to Canvas. The upside is that the coordinate system between 2d canvas and SVG is basically the same, X,Y coords originating from the top left.

It's also worth mentioning that the calculations are very unlikely to be the bottleneck to good performance here. So it's probably best not to waste too much effort on those. Computers are pretty fast at dealing with sets of numbers and calculations, even in JavaScript. At the moment that does become an issue we can address it.. move it to a worker, WASM, etc. But for now, the biggest choke point is the DOM rendering.

Actions to be taken

[ ] analyze the positioning algorithms used, and try to abstract them away from any rendering logic
[ ] Attempt to implement a canvas layer for the tree circles
- [ ] Render the circles to canvas
- [ ] Create position indexing with RBush
- [ ] Add transparent div to handle user events like hovers and clicks
- [ ] Have transparent div handle scrolling and update the canvas rendering and position index (rbush)
[ ] Attempt to implement the map overlays
- [ ] Create a map overlay for pie charts
- [ ] Create a drawing function for drawing a pie chart with Canvas context
- [ ] Add the Leaflet canvas layer and render the pie charts
- [ ] Create a map overlay for the "circles" for the connected circles visualization
- [ ] Create a map overlay for the brush-animated arcs, and wire it up to update on brush selection
[ ] Look into the easy win of not rendering visualizations that are off-screen.
- [ ] Create a component that will render a different set of children if it is some specified percentage offscreen, something like: <OffscreenChecker percent={80}>{(isOffscreen) => isOffscreen ? <div className="greyplaceholder"></div> : <ActualComponent/></OffscreenChecker>... There may even be existing components that can do this.

Possibly related #955

Looking more deeply into this, given that the lines for the tree also have a hover callback, we might not be able to do this in stages. It might be more beneficial, and faster, to develop a second component that we can swap in place. (Ideally something that could even be separately open sourced at some point).

Thanks @benlesh -- really exciting.

Re: canvas. We've liked the functionality d3 gives us here (events, transitions etc), but I agree that it's time to seriously consider moving away from SVG. I haven't done stuff in canvas since the early days where we used the low-level API & computed intersects ourselves, but I know there are really good libraries now. Could you talk more / point me to a simple example of what things would look like to e.g. have a hover-event attached to a tip, transition that tip etc? I know @colinmegill is a fan of regl -- would you suggest using this?

Stepping back a bit, we currently have a (large-ish) piece of code -- here and here -- which essentially works out what redux actions are incoming and generates the necessary d3 (imperative) commands. This is largely needed due to the problems that arise if you make a d3 call while a previous d3 call is running / transitioning (not fun to debug). Perhaps this can be completely avoided by just re-rendering the entire canvas (which we couldn't do in SVG, too slow).

It might be more beneficial, and faster, to develop a second component that we can swap in place.

Yes, I agree this is probably simpler. Perhaps doing the map first would be simpler, since there aren't any events yet (although there's a PR in which adds them for hover)? But on the other hand the tree is the performance bottleneck.

I highly recommend ditching support for IE.. if we're talking about working on something that will help save lives, we shouldn't slow down progress for a browser that is dead in a year

The CDC still uses IE (I've put out an internal call to confirm and get the actual version for you, I believe it's 11) and so it's important to keep supporting IE.

It also seems that the entropy calculation is expensive, and may be being done too often.

Chiming in with my own analysis (less as an objective answer and more as a separate viewpoint) of the Rectangular layout (that may also apply to other layouts, too):

Branches vs. Tips

It seems like a significant amount of render time (~6x more) is spent on rendering the branches vs. the tips.

I agree with @benlesh's assertion that ideally, we'd make the position calculation logic more abstract - but if we want a quick "lift-and-shift" win (particularly if time is of the essence due to the COVID-19 crisis, or if you plan to replace this viz entirely later on), we can probably get one by optimizing the branch rendering.

This leads me to my next point: WebGL.

WebGL and gradients

I did some tinkering with PixiJS and WebGL over the weekend. My approach here was to layer a transparent Canvas element atop the existing SVG one and (via CSS) disable mouse interaction.

(I kept the existing tips because a) they handle position calculation already, and b) they weren't the lowest-hanging fruit performance-wise.)

Drawing solid color lines with PixiJS on this canvas is pretty darned fast (~40ms for all of drawBranches vs. ~300 ms now on the ~2k point COVID-19 dataset), but gradients are another matter (and slower, in my very-much-a-noob implementations - the attempts I've taken require either resizing or creating a bunch of different textures [for each different gradient color/size], which I suspect are quite computationally expensive).

We could also possibly implement a rudimentary gradient alternative - e.g: 1) the vertical branch is set to the parent node's branchStroke value, and the horizontal branch is set to the child node's branchStroke value 2) split each line into multiple solid-color lines with intermediate colors 3) use gradients for thicker and/or longer lines, and solid colors for the rest

Conclusions

TL;DR

I think we could get a solid performance win by replacing {SVG, gradient} branches with {WebGL, solid-color} ones - but I'm not sure how important the gradient feature is.

Alternatively...

If gradients are important, then we're probably better off trying to improve the existing SVG code first. I reckon we could get a ~20% speedup on a ~2k point dataset by aggressively optimizing the existing SVG logic.

nextstrain / auspice