linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

prepare_heatmap() wasn't fully updated #137

Closed JobLeonard closed 6 years ago

JobLeonard commented 6 years ago
2017-11-19 10:37:05,597 - INFO - Found 2 projects
2017-11-19 10:37:05,597 - INFO - Entering project C:\Users\eyald\loom-datasets\cortex
2017-11-19 10:37:05,597 - INFO -   Connecting to cortex.loom locally
2017-11-19 10:37:05,629 - INFO -     Precomputing heatmap tiles (stored in cortex.loom.tiles subfolder)
2017-11-19 10:37:05,629 - ERROR - prepare_heatmap() missing 1 required positional argument: 'truncate'
2017-11-19 10:37:05,629 - INFO - Entering project C:\Users\eyald\loom-datasets\first

Dumb mistake on my part, will fix immediately.

AmitLab commented 6 years ago

While running "loom tile cortex.loom" I got an error ERROR - module 'scipy.misc' has no attribute 'toimage'

pip install pillow Solved it

AmitLab commented 6 years ago

Can the HeatMap be sorted? or a cluster be removed like in Sparklines.

JobLeonard commented 6 years ago

ERROR - module 'scipy.misc' has no attribute 'toimage'

Yes, that is when you use MiniConda instead of Anaconda. I should probably add this to the instructions, because you're the second person to bring this up (and the previous person did not know what to do)

Sorting heat map

TL;DR: if you want to sort the heat-map, your only option is to sort the data matrix (and associated metadata attributes, of course!) from the original loom file, save it, and re-expand everything.

The reason that you cannot sort the heat map in the viewer is because it is a direct rendering of the whole matrix, in the ordering of said matrix.

We pre-render the heatmap because generating such a large overview dynamically would involve downloading and then processing the entire loom file.

This defeats the purpose of the loom-viewer, which is letting people download and explore only the data that they want to have a look at, without having to fetch and analyse the whole loom file.

For example: the biggest test-dataset that we have curretly contains 192k cells, and is 4.5 GiB in size. Suppose I want to only explore ten genes. Well, the metadata attributes add up to 20 MiB. If I somehow managed to select the ten biggest genes in the data set (in terms of size), that is 2,3 MiB.

So instead of 4,5 GiB, you only need to fetch 22,3 MiB! And only once too, because the viewer uses off-line caching for all metadata and genes that it has fetched before.

Sparklines and scatterplots then generate the data on the fly, but for the heatmap this is not feasible: generating a zoomed-out picture requires using all the data, and for a 4,5 GiB loom file this is not feasible, both in terms of time and size.

So we use leaflet.js, a library for making interactive maps, to generate an overview once: we pre-generate tiles for each zoom level, at 256x256 pixels. These usually are less than 50 KiB. Say you have a full HD screen, 1920x1080. That means that even if you somehow manage to make the viewer full-screen, without the borders it has now, you would at most need 8 by 5 tiles. 40 * 50 = 2000 KiB, so any zoom level contains at most 2 MiB of image data, and it also doesn't cost time to process them any more because they have already been turned into images.

So you see, everything is set up to minimise transferring data. It keeps things fast and responsive on the user side, and when used to host a website to share loom files, will save on server costs for the owner too.

Now, you can programmatically move the leaflet map view to a coordinate, and we can also calculate the vertical coordinate for each row number. So with Gene name metadata it is theoretically possible to code something that lets you type in and zoom to a specific gene, However, that would require quite a bit of new code.