linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Better averaging strategy for heatmap tiles #135

Open JobLeonard opened 6 years ago

JobLeonard commented 6 years ago

Because of issues #114 and #134 I'm looking at this code again I've done a quick investigation into strategies for generating zoomed-out tiles from our data.

Currently, we pick the top-left corner out of four data-points. This allows for enormous systematic biases: half of the genes are removed for each zoom level! I think we can do better.

For comparison, here is a pixel-perfect zoomed-in view of the cortex.loom dataset:

screenshot-2017-11-8 published cortex loom 12

Ideally, we want to maintain similar brightness, some sense of noise profile, and visible structures. In practice we will need to compromise on something that does well but no perfect on all three.

top-left pick (current strategy)

screenshot-2017-11-8 published cortex loom 8 screenshot-2017-11-8 published cortex loom 7 screenshot-2017-11-8 published cortex loom 6

This happens works decently enough on this dataset, presumably because the distribution in the data is random enough to counter the systematic bias. On other datasets the zoomed out view is almost completely blue, despite having non-blue rows, hiding interesting spots.

Also, structures present in zoomed in views (rows and columns that have expression levels from top to bottom) is almost completely gone when zooming out.

Average

screenshot-2017-11-8 published cortex loom 11 screenshot-2017-11-8 published cortex loom 10 screenshot-2017-11-8 published cortex loom 9

Too smooth, and because the value distribution is not uniform it introduces a bias of its own by draging the high values down. It does preserve structure better.

Max value

screenshot-2017-11-8 published cortex loom

Yeah... moving on...

Max value per column, average per row

screenshot-2017-11-8 published cortex loom 5 screenshot-2017-11-8 published cortex loom 4 screenshot-2017-11-8 published cortex loom 3 screenshot-2017-11-8 published cortex loom 2 screenshot-2017-11-8 published cortex loom 1

Now we're getting somewhere! While still biased to the maximum values too much, resulting in higher values every time we zoom out, this maintains the structure visible when looking at the zoomed in tiles.

Max-biased weighed average per column, average per row

screenshot-2017-11-8 published cortex loom screenshot-2017-11-8 published cortex loom 20 screenshot-2017-11-8 published cortex loom 19 screenshot-2017-11-8 published cortex loom 18

We take the weighed average per column, biasing max:min value 3:1. Then we take the plain average between rows.

While brightness still slowly increases as we zoom out (this might be tweakable with a different weight, but it also depends on the underlying values so I don't think there is a "generic" way of doing this), it is not that pronounced, and it maintains the aforementioned benefits.

I think the last strategy is a good replacement for our current one. Also, we're using numpy methods, so this does not create a significant slowdown.

JobLeonard commented 6 years ago

Old vs New:

https://www.youtube.com/watch?v=IjYZybeB4N4

https://www.youtube.com/watch?v=AB86fNJuzOU

(also, the private server is a few versions behind in terms of the loom-viewer. @pl-ki, can you show me tomorrow how it was set up and how I can update it?)