linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Update more lazily and other performance possibilities #86

Closed JobLeonard closed 7 years ago

JobLeonard commented 7 years ago

Ok, since my laptop still hasn't been fixed I'm starting to get desperate for what kind of stuff I can do without clashing with the three+ weeks of uncommitted work I have stashed on there...

So this would normally be a low-priority issue, but here goes: a working document for places where the client can do less work than it's doing now. Note that almost none of this is high-priority; except for a few things that scale really badly at the moment, like the sparkline rendering. It's just keeping track of places where we can still win some performance, for when the datasets get really huge.

Update filtered data more lazily

Currently, when we change a filtered value or resort, we update all attributes and all fetched genes. This is wasteful, and can especially lead to slowdowns if one has opened a tab for a while and has fetched a lot of different genes.

One option would be to not update filteredData until it's accessed. We could do this by instead setting it to null when sort order and/or filters are updated. Then, once a component tries to access it, it should check if it's null and if so, generate the actual filtered data (a simple helper-function in util.js should do the trick). At this point we could dispatch the generated data and insert it to the redux store. This however would create another problem: imagine we are watching twenty sparklines. Each one would dispatch, resulting in twenty roundtrips through the redux store. Ouch.

The other option would be to keep track of which attributes are visible. This is actually doable: each view state keeps track of which attributes are used, after all. There's two parts to this:

Both would end with updating the filteredData for the visible attributes. Note that changes to filter/order state are triggered by UI interaction, meaning the view state also changes. So both can be triggered at once, in which case view state should be updated first.

updates to the view state

The logical place within the redux state tree to keep track of this would be:

{
    [dataset_name]: {
        data: {
            col: {
                visibleKeys // array of strings
            },
            row: {
                visibleKeys // array of strings
            }
        }
    }
};

... since col and row already keep track of keys (and in the case of col, geneKeys)

updates to filter/order state

Small helper functions should be created for each view that can generate an updated state tree for visibleKeys based on the current view state. This would make it easier to extend the number of views later. The returned state tree would look like:

{
    col: {
        visibleKeys // array of strings
    },
    row: {
        visibleKeys // array of strings
    }
}

For example, the metadata pages always show all attributes, so its helper function would just return all attribute keys.

These helper functions would be called in in updateViewState and setViewStateURL, and the result would be merged into the state tree.

All the reducers related to updating filter/order settings (basically everything from updateFilterIndices onwards) should be updated to only update visibleKeys, instead of all keys.

Don't redraw all scatterplots when using scatterplot matrix when changing one row/column

Similar to sparklines: if we have a matrix of say, 5 genes we compare, which results in 15 tiny plots, and replace one gene with another, we only need to update 5 plots. Currently we update all of them. We can be smarter about this.

Note that if the number of rows/columns changes, the dimensions change, so all plots have to be updated.

Turn scatterplot into a stateful object that contains a number of composed functions to render

Scatterplots now go through these steps:

  1. convert x, y and color attributes to float32 arrays
  2. (optionally) jitter x and/or y arrays
  3. sort x, y and color arrays by x and y positions for ideal tiling
  4. (optionally) log convert x and/or y arrays
  5. scale to context dimensions
  6. generate sprites based on colorMode and context dimensions
  7. render dots, mapping color attribute to color ramps

If all I change is the color attribute, steps one to six are superfluous. If I change the log settings, I don't need to regenerate the sprites, etc.

Currently scatterplot generates a function that just takes a context and draws. Instead, this could be changed to returning an object that keeps track of which parameters are being updated, and only updating those parts before rendering to a context (note that steps one to four do not require any knowledge of a context anyway).

Render scatterplot edges separately

Edges will be implemented at a later date, but this is just a reminder to immediately do it right the first time:

The trickiest bit of this would be the jittering. Everything else just implies working in two canvas layers, and drawing these on top of each other.

Don't redraw all sparklines when one sparkline is added

When adding one sparkline, we shouldn't redraw all the other ones if their attribute settings haven't changed. In fact, given how redux works it's possible that the component gets updated many times with identical props, so ideally we avoid re-rendering altogether when nothing has changed. This requires rewriting SparklineViewComponent from a stateless function to a component that stores its rendered sparklines and only updates them when the relevant props have changed.

Make the sparkline painter more straightforward

Similar to scatterplot above, the sparkline painter should be reorganised into a stateful object with a straightforward composition of attribute conversions. Right now it's an interleaved mess of closures that return closures... Will get back to this tomorrow (or preferably, will get my laptop back tomorrow so I can focus on more pressing matters for now), I'm heading home for tonight :P

JobLeonard commented 7 years ago

Most of these have been implemented in one way or the other - all that's missing is the scatterplot update, and that will need a lot of rewriting anyway when we move to interactive plots