linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Offline caching #54

Closed JobLeonard closed 6 years ago

JobLeonard commented 8 years ago

Ok, this is low, low priority, but I'm writing this down before I forget all of it again.

This sounds like a pretty perfect fit for our caching needs:

IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. This API uses indexes to enable high performance searches of this data. While DOM Storage is useful for storing smaller amounts of data, it is less useful for storing larger amounts of structured data. IndexedDB provides a solution. This is the main landing page for MDN's IndexedDB coverage — here we provide links to the full API reference and usage guides, browser support details, and some explanation of key concepts.

Note: This feature is available in Web Workers.

Note: IndexedDB API is powerful, but may seem too complicated for simple cases. If you'd prefer a simple API, try libraries such as localForage and dexie.js that make IndexedDB more user-friendly.

Note: Some older browsers don't support IndexedDB but do support WebSQL. One way around this problem is to use an IndexedDB Polyfill or Shim that falls back to WebSQL or even localStorage for non-supporting browsers. The best available polyfill at present is localForage.

https://developer.mozilla.org/en/docs/Web/API/IndexedDB_API

Generally, the size limit is 1GB per website; plenty enough for regular usage. Given the simplicity of our scheme, the mentioned localForage is probably perfectly suitable for our needs

Alternatives: Web Storage, while having a much simpler API, is limited to 10MB, which we'll easily go over with after fetching just a few tiles/genes for one dataset. PouchDB is overkill, since it's an offline DB that can synchronise with an online DB, but we only download from the server, never upload.

Datasets, row/column data and genes

Right now everything is "cached" in JS objects, which last as long as the tab is open. Caching it in indexedDB would make this persistent, which is great for, say, long trips with unreliable internet.

Here is an explanation of how someone combined this with redux: http://stackoverflow.com/questions/33992812/how-to-integrate-redux-with-very-large-data-sets-and-indexeddb

Client-side it looks like we can "just" migrate the JS-based caching to localForage (treating that as "cold" cache, while keeping some of it in JS as "hot" cache). Wrap access to it in a bunch of reducer thunks (localForage is async and uses promises) and we're set! Much easier said than done of course, but the principle seems straightforward enough.

Exposing which data has been downloaded and cached, and allowing for manually clearing it is probably good too.

Server-side we'd need a way to signal that a loom file has been updated (for example, if we fix a bug in the pipeline or implement a better version of backSPIN). Just adding a simple cache-busting hash should do the trick, right?

Heatmap

Enhancing heatmap with indexedDB is also possible, and a completely separate piece of logic since the heatmap tiles are not (and should not) be stored in the redux store: https://github.com/tbicr/OfflineMap

JobLeonard commented 8 years ago

A good example of an existing web-app that does offline storage really well is devdocs.io:

http://devdocs.io/offline

image

JobLeonard commented 7 years ago

It turns out that some caching can be done through AppCache.

https://www.html5rocks.com/en/tutorials/appcache/beginner/

https://alistapart.com/article/application-cache-is-a-douchebag/

Specifically, the static assets: the script and the css. This will avoid having to download 900kb every time the website is refreshed.

JobLeonard commented 7 years ago

AppCache implemented, pretty neat! Saves us about a megabyte each time we refresh.

Now I'll just need to add LocalForage support for downloaded metadata, so that we can really work off-line

(also, the reason I picked this up is because golden-layout needs to use off-line storage anyway, so I might was well implement it).

JobLeonard commented 7 years ago

Got the metadata saving implemented, and because we're using localforage it is stored as proper JS objects, which saves us going through the hoops of converting most JSON data again (the only thing we can't store are functions).

As a result, the Oligos All dataset (200k cells, 30+ MiB metadata) only takes four to six seconds to load once cached (the variation depends on which view - metadata is particularly slow for this dataset). Unchached that is close to fifteen to twenty-five seconds. And remember that this is when serving locally - on a slower connection this will be much worse!

Not having to download this data again every time someone opens the same dataset will also save bandwidth costs, although I don't know if that will be a significant amount.

JobLeonard commented 7 years ago

Implemented gene caching over the weekend, so the basic infrastructure behind this is done.

Refreshing the Oligos Sparkline page with the twenty default genes before caching: 15 seconds. After caching: 3 seconds. It's also 52 megabytes in IndexedDB that we don't download again.

Those are numbers that make me happy ;)

This does suggest that heavy Loom usage can fill up IndexedDB pretty quickly, so we might want to allow manually cleaning the cache of datasets's metadata and/or genes. OTOH, browsers are free to empty IndexedDB if it fills up too much.