linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Change the way heatmap tiles are generated/served #88

Closed JobLeonard closed 7 years ago

JobLeonard commented 7 years ago

In the ever growing list of "things to keep Job occupied while his laptop is taking longer to be fixed because they delivered the wrong computer part so I have to wait another week AAARGH!", I discovered this issue while testing out the website on Lars' desktop today...

Since Sten and I both use high-end machines, this problem has gone unnoticed before. Now that I'm on a slightly older desktop with a non-SSD drive and less RAM, the slowness of serving heatmap tiles becomes more obvious. We're talking 1s waiting times per tile, for tiles that are at most 8kb in size:

https://www.youtube.com/watch?v=1hvhNLxnUn0

The second point leads me to conclude that the slowdown must be happening somewhere between the lookup within the loom file and its conversion to a PNG image. The third that the main bottleneck must be the data lookup within the loom file.

One simple way to avoid this is to not look in the loom file at all: we could save the images as they are generated as tiles, and then serve those. These would function as a cache.

The idea would be as follows:

1 Given a loom file {some name here}.loom, create a folder called {some name here}.loom.tiles. 2 Within this folder, tiles will be saved with the schema {z}_{x}_{y}.png 3 If a tile is requested, test if the file exists. If not, generate it 4 Serve tile from png

This is a fairly minor change to the existing code.

Note that this is mostly for offline use of loom, for people using less-than-ultrafast-computers (I think Lars' desktop is actually quite a realistic target for users). Still, it should make the server run faster and cheaper too: serving static files is a lot faster than generating them every time a tile is served, and disk storage is probably cheaper than CPU time.

Also, the PNG files would not add much storage space: they're 8 bits per pixel, compared to 32bit floats per matrix entry, and the image tree does not add many overhead relative to the original dimensions of the matrix, since the image is quartered in size for every level we zoom out. That means the amount of points in the image tree grows as the sum of 1/k^2, which is always less than 1.645 times because:

image

So for a 1 GiB loom file I would expect at most around 375mib of PNG files, if all tiles were pre-generated, and ignoring that the PNGs probably compress slightly better than the loom format. Plus, and as noted above, if we generate these images lazily we save computation/storage space.

EDIT: I made a silly mistake, it's actually the sum of 1/(4^k), which is 4/3.

JobLeonard commented 7 years ago

Safely creating a new folder in Python:

http://stackoverflow.com/a/5032238

JobLeonard commented 7 years ago

Implemented. On this machine it's quite a speed boost on the server side!

JobLeonard commented 7 years ago

Ok, so I've done some more testing, using Forebrain_E9-E18.5.loom as the test file (it's the biggest loom file online at the moment, making it a nice "if it works for this, it works for all of our loom files" kinda test-case). I downloaded the file, kept a copy of the original, then manually made it render all heatmap tiles through the client by systematically scrolling through it while the browser was at 25% zoom (this is in the order of 20 * 30 = 600 requested tiles per pan):

peek 2017-02-03 14-21small

The benefit of doing it this way is that I immediately could see how the client handles it, and also how the server deals with many tile requests at once.

First, the resulting numbers are kind of surprising to me:

Another thing is the serving speed. It seems to be quite capable of keeping up, until a certain critical level of requests is reached, and the bottleneck is with tiles that still need to be generated. Basically, if the queue of yet-to-generate tiles hits some unknown number, it goes from milliseconds to serve a tile to 20+ seconds per tile.

I suspect that the bottleneck is writing the tiles back into the loom file, since HDF5 does not allow parallel writes.

These points together suggest to me that we're probably better off not storing the image pyramid in the loom files at all, and completely rely on saving external PNGs. We save literally ten times the storage space, avoid that the server hangs, and the loom files will be smaller to download.

We might have to change some kind of default setting so that when a loom file is uploaded to the server, all tiles are pre-generated, since it would otherwise severely screw over the first person to view a heatmap.

slinnarsson commented 7 years ago

Just to mention that all the Loom files should come with a pre-rendered image pyramid. However, I sometimes skip that step to save time, in which case the server will need to generate them on demand.

It might make sense to have a flag on startup to force the server to render all image pyramids once and for all (by calling self.dz_get_zoom_image(0,0,8), which is the top of the pyramid, on each file).

On 6 February 2017 at 13:34:53, Job van der Zwan (notifications@github.commailto:notifications@github.com) wrote:

Ok, so I've done some more testing, using Forebrain_E9-E18.5.loom as the test file (it's the biggest loom file online at the moment, making it a nice "if it works for this, it works for all of our loom files" kinda test-case). I downloaded the file, kept a copy of the original, then manually made it render all heatmap tiles through the client by systematically scrolling through it while the browser was at 25% zoom (this is in the order of 20 * 30 = 600 requested tiles per pan):

[peek 2017-02-03 14-21small]https://cloud.githubusercontent.com/assets/259840/22646450/c49f7b84-ec6c-11e6-89b4-676644ac5933.gif

The benefit of doing it this way is that I immediately could see how the client handles it, and also how the server deals with many tile requests at once.

First, the resulting numbers are kind of surprising to me:

Another thing is the serving speed. It seems to be quite capable of keeping up, until a certain critical level of requests is reached, and the bottleneck is with tiles that still need to be generated. Basically, if the queue of yet-to-generate tiles hits some unknown number, it goes from milliseconds to serve a tile to 20+ seconds per tile.

I suspect that the bottleneck is writing the tiles back into the loom file, since HDF5 does not allow parallel writes.

These points together suggest to me that we're probably better off not storing the image pyramid in the loom files at all, and completely rely on saving external PNGs. We save literally ten times the storage space, avoid that the server hangs, and the loom files will be smaller to download.

We might have to change some kind of default setting so that when a loom file is uploaded to the server, all tiles are pre-generated, since it would otherwise severely screw over the first person to view a heatmap.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/88#issuecomment-277668372, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag099NWnra0MAPq5WzER5lzMfwKFtks5rZxNngaJpZM4L0GSy.

JobLeonard commented 7 years ago

Just commenting here a public sketch-pad so I keep track of how I'm reorganising the code, feel free to ignore.

Given the very significant overhead in space, time and CPU that the current scheme for serving tiles has, to the point where it can cause the server to hang, we need to reorganise it.

Here is the current order in which tiles are called:

  1. send_tile(project, filename, z,x,y) (loomserver.py)
    • currently checks if the tile already exists as a PNG file.
      • if so, serves it
      • if not, calls dz_get_zoom_image(), saves the returned PIL object as a PNG, and serves it
  2. LoomConnection.dz_get_zoom_image(self, x, y, z) (loompy.py)
    • calls dz_get_zoom_tile,
    • crops returned array if required
    • converts array into a PIL object and returns it
  3. LoomConnection.dz_get_zoom_tile(self, x, y, z) (loompy.py)
    • if z < mid:
      • recursively calls itself four times at a zoomed in level
      • merges the returned arrays
      • stores resulting array in the loom file (to be replaced by PNG saving)
      • returns array
    • if z == mid:
      • looks up region within the data matrix,
      • pads and rescales it,
      • returns the final array
    • if z > mid (no longer necessary, the zoom-to-individual pixels issue was addressed in #63 and since then we never request a zoom level deeper than 1:1 pixel to data ratio, so we can strip out the part of the python code dealing with that)

Note that essentially, we have three different forms of data: the final PNG image, the PIL object, and the raw numpy array. We could convert between the PNG image and the PIL object, but in theory the conversion of the numpy array to a PIL object is lossy. In order to recursively create these tiles, we need to stay in the nparray format. It's also ridiculous that saving the PNG happens in the server code, not in the recursive tile generation code.

Here's my reorganised version that takes this into account:

JobLeonard commented 7 years ago

Implemented. In theory we could make this multiprocess too, since it's an embarrasingly parallel problem, but that interferes with the threading of the server. We should really split the python code in to different parts.