nteract / semiotic

A data visualization framework combining React & D3
https://semioticv1.nteract.io/
Other
2.43k stars 132 forks source link

Better implementation for batch rendering on canvas #608

Closed alexeyraspopov closed 2 years ago

alexeyraspopov commented 2 years ago

Back when I was working on #547 I started thinking about an alternative solution for batching, that would potentially eliminate the root cause of bugs in renderQueue and simplify the batching overall. In this PR I'm trying to apply the solution I've been using before for some SVG shenanigans. The way it is implemented, we can possibly use it for some other tasks if necessary, not just canvas rendering.

One of the reasons to make this refactoring now is that VisualizationLayer is a class component and switching to this new implementation will make it easier to convert the component to function.

Context

The context of this PR is <VisualizationLayer />'s approach to render data on canvas. This approach allows rendering considerably larger number of datapoints, given that there is no need to add new DOM nodes. Even though the cost of rendering a single data point is smaller for canvas, we need to take into account possible amounts of data that needs to be processed. Canvas rendering itself is sync process which means we can easily get into state where rendering all datapoints on canvas takes more time than a single frame. Worst case scenario, the webpage can freeze due to main thread working without yielding back to browser engine.

This is where batching comes into play. In order to ensure that the page does not freeze during render, let's slice the dataset into chunks, render them one by one, making sure the main thread yields back to browser in between those chunks. Eventually we'll get all data rendered without affecting UX. Worst case scenario is that the rendering may take a couple of frames, which still most likely won't be "visible" to the user, or at least won't be a deal breaker. In any way, this is still cheaper than rendering the same amount of data in SVG.

Problem

Current solution for batching in VisualizationLayer is implemented as renderQueue class. The class is quite flexible and dictates particular workflow, however just a single use case is being used in the component. The class is pretty straightforward when it comes to batching but it has several significant flaws. One of them was already fixed in #547. Another one is about the fact the class slices the original dataset, allocating memory chunks for every single batch of work. Even if this doesn't allocate much memory overall, inevitable garbage collection cycles may slow down the rendering process, introducing skipped frames and extending time to complete overall. One more issue I found in process was the fact that VisualizationLayer does not attempt to cancel existing renderQueue when unmounting. Even though it is a rare case to hit, the fact it is possible is something I'd like to fix.

batchWork()

This PR introduces a new internal utility batchWork(). The function receives a routine that needs to be batched, and it expects that the routine returns a boolean value which would define if batching needs to continue.

// let's imagine a list of "tasks" that need to be done asap without blocking main thread
let tasks = [/* ... */]

let promise = batchWork(() => {
  // let's take tasks from the list one by one
  performTask(tasks.shift())
  // if there any tasks left, return `true` so `batchWork()` continues batching
  return tasks.length > 0
})

promise
  .then(() => console.log("batching complete"))
  .catch(error => console.log("something went wrong"))

batchWork() controls how often the work needs to be done. If the routine is fast, it can be invoked several times during a single frame, otherwise the utility uses requestAnimationFrame() to schedule the following batch of work.

Here's the algorithm in pseudo-code:

function batchWork(performWork) {
  startTime = currentTimestamp()
  elapsed = 0
  shouldContinue = false
  do {
    shouldContinue = performWork()
    elapsed = currentTimestamp() - startTime
  } while (shouldContinue && elapsed < MAX_FRAME_MS)

  if (shouldContinue) {
    schedule(() => batchWork(performWork))
  }
}

The utility does not hold any additional state, so the work routine must do so in closure.

The utility returns a promise that resolves when the work is done and rejects when any call of routine throws an exception.

The utility receives additional option timeFrameMs that can configure how often the work should yield. The default time frame is 30 ms which is around 2 frames in best conditions. VisualizationLayer uses default value but we can modify it after additional testing.

Since there is just no way to predict how long the work can take and something else can run in between the batches, we need to be ready to cancel the work in progress in some particular cases (e.g. the data viz component being unmounted). batchWork() receives additional option signal https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal that is used to verify that the work wasn't aborted before running the next batch of work.

let ctrl = new AbortController()

batchWork(() => {
  renderSomeStuff()
  return anyStuffLeftToRender()
}, { timeFrameMs: 16, signal: ctrl.signal })

cancelButton.addEventListener("click", () => {
  ctrl.abort()
}, { once: true })

AbortController has decent browser support, even larger than ResizeObserver.

VisualizationLayer changes

The implemented utility is getting used in VisualizationLayer for rendering data points on canvas. There is one more trick required to make it work though. By itself batchWork() is not aware of the notion of datasets, queue, or anything related to the amount of work. It only exists to control time. The visualization component, knowing about the amount of data that needs to be rendered, needs to make some smaller chunks to ensure the batching works. Otherwise we won't get any benefits from the batching utility, if we gonna do something like allData.forEach(datum => render(datum)). The way how renderQueue does it, it just cuts small slices from the dataset, 1k items each and renders them separately. In VisualizationLayer.ts I've implemented batchCollectionWork() that makes use of batchWork() to run rendering function while iterating over the target dataset. The way it does it, doesn't require allocating memory for slices, it simply moves a pointer.

https://github.com/nteract/semiotic/blob/f56dbde5382cfd7dd1c381d3b96dce5beb182a53/src/components/VisualizationLayer.tsx#L567-L579

Besides switching from renderQueue to batchWork, I also fixed the use of disableProgressiveRendering (https://github.com/nteract/semiotic/commit/36e61f294bfc38b1ab8cb7788c9720ef93ccf724): previously the sync rendering didn't really happen as there was no call for actual render to happen.

VisualizationLayer now also includes an AbortController that is used along with batchWork(). The use of it in a class component may seem awkward, but it will get better when VisualizationLayer is converted to a function component:

function Component({ data }) {
  // ...
  useLayoutEffect(() => {
    let ctrl = new AbortController()
    batchWork(() => {
      // ...
    }, { signal: ctrl.signal })
    return () => {
      ctrl.abort()
    }
  }, [data]);
  // ...
}
vercel[bot] commented 2 years ago

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/nteract/semiotic/HEzd7SxcgB1poqqFdR3QbCnkhkU8
✅ Preview: https://semiotic-git-batchwork-nteract.vercel.app