Don-Isdale commented 5 years ago

Pretzel has asynchronous loading and display of requested blocks, which provides a more responsive user experience. Extending this to load fractional parts of blocks will be necessary for the larger data sets, e.g. multi-genome.

The amount of data which can be transferred from the server and processed by the browser is more than users can absorb; progressive data display : . show the data at the relevant level of detail for the zoom level; . show data pertinent to the current scope selected by the user.

There are a number of progressive features we can add. These can be developed independently, at different times or in parallel and have an additive benefit :

Paths : The paths between 2 blocks displayed on adjacent axes will be less numerous than the features on those blocks; often by orders of magnitude. So the paths can be requested before the features. The paths result contains these attributes of the connected features : name and location; this is sufficient to be used when displaying axis ticks and brush results. i.e. a brush can first display the features which are known from paths, and an asynchronous request for the features in the brushed region can add to this result.

Paths themselves can be progressive, using interlacing (which is also proposed for features of a block) : return a % of paths, paginate through in an interlaced way. % of paths will indicate the general characteristics of alignment. (Using a % of features produces a less representative set of paths, because not all markers have paths, and ratio varies, result is uneven)

Prior to requesting paths, the synteny blocks can be requested. This summarises in large block structures the general shape of the alignment. It doesn't need to be complete or exhaustively calculated - just a general indication of the larger blocks.

Ranges and interlacing : The block request currently returns all features of a block. Parameters for the sub-range within the block, display resolution, and page # can be added to this API. Pagination can work by interlacing : if the block is split into 100 pages then every 100th feature will be included in a single result, and the page number is the offset (0 - 99). Display resolution and number of pages are related; we can calculate a suitable interlacing fraction from the pixels used to display the block; not clear yet whether to do that in frontend or backend. The dataset & block list which the frontend requests when it starts contains the block length (interval / range), and could also include the # of features; given those 2 the frontend can do the calculation.

Interlacing / pagination : While the display is constant, additional pages can be requested to fill out the display. To simplify the calculation of which features are yet to be requested, the #pages can be constrained to be a power of 2, so that changing zoom level the pages already received can be simply translated into pages at the new zoom level, e.g. zoom in : 2 page becomes 1, so if 1 is already in hand then just request the other.

Sub-Features : The sub-feature mechanism is implemented in the database; the current backend functionality is to traverse and include all sub-features, but for progressive loading it will be useful to display the features in one step and their sub-features in the next step. We have discussed passing a level indicator in the API request - to get the features from level 1 / 2 ... / all. - notes below from discussion.

AGP structure : Showing the levels of AGP structure : super-scaffolds / scaffolds / contigs, as the user zooms in, will produce a progressive view. So the question of how to represent the layers of AGP structure (either distinct & parallel blocks or sub-features) is pertinent.

Zooming in : The blocks can define the threshold (px/length) at which their content should be displayed; before this point the FE will not request the block contents. If the user is displaying multiple tracks in parallel on the axis which is being zoomed, then it we could set a higher threshold for displaying each level of detail, both because the display would be overloaded with information, and the total request time would increase.

At some point the user may want to request all further detail, without having to zoom in to prompt it.

Initial Summary : A block can be summarised by a histogram showing a count of features in evenly-spaced intervals. (a related alternative is to construct a balanced tree; displaying the intervals of progressive layers of the tree would indicate density).

Thumbnails : The initial level of detail in displaying a block can be a coverage graph; this can be calculated, cached, updated re. new data, via the make rule (see #126). Possibly a relationship between blocks and a 'thumbnail' representation can be recorded (e.g. in dataset meta), so that when a block is added, its thumbnail can be initially displayed.

design : The frontend will probably need to include in API requests to the backend the resolution at which the result will be displayed; e.g. the length in pixels at which a requested block or section of a block will be displayed. This can be approximate. Pretzel (either FE or BE) will need to keep track of the progressive loading of blocks : which pages / ranges / levels of detail have been requested & received, so that further zooming / panning does not cause repeated requests for the same data. This is possibly one of the more complex parts of this functionality - the solution is not obvious.

Transitions : As the data arrives asynchronously, it should be displayed in a fluid / smooth way using transitions, which can make it easier to comprehend and produce a more satisfying user experience.

The paths can each have a distinct transition time, producing a smooth flow.

related : path drag transition : just show a % of paths

Discussion re. API requests for levels of sub-features.

dev (~ 2018Sep18)

... how does the FE request multiple levels? alex [3:44 PM] /api/Features/depthSearch?blockId=&depth= the current request that the FE does, requests the block and says include its features that one still returns all of the features no matter the depth also, the datasets have to be reuploaded for the depthSearch to work ... have a unit/pixel threshold in the API request It's OK (and probably good) to have both level and threshold, either param can be optional. ... rad [4:18 PM] if visible size of feature > x pixels fetch sub-features ... alex [4:23 PM] I think we talked about requesting features by precision if feature.value[1] - feature.value[0] > precision then get child features and FE can calculate precision based on zoom level ... don [4:26 PM] the zoom can complete and display the known features, and in the background the sub-features are being requested, and are displayed when they arrive. FE should know when it has requested all levels of a feature

Don-Isdale commented 5 years ago

resolution / detail threshold factor above : provide a GUI slider for user to adjust this to get greater/less information density. There should be an option (maximum value of slider, or checkbox) to get all information. measured in features/pixel, e.g. mostly in the range 0.01 - 0.1 (probably a logarithmic scale on the slider)

Don-Isdale commented 5 years ago

most significant boost to progressive loading is to instead of loading features of a block; show in this order :

show a histogram summary of feature counts
display a % of paths
load features when resolution threshold is met, and then using pagination / interleaving

changes : FE: delaying the API which load blocks including all features, instead progressively side-load them in. BE : API for paths : cache the result and provide it in pages

interval tree to record which ranges of features have been loaded; structure for block : features, it is like a sub-feature - has range, request params - resolution, range (initially thought of representing these as pseudo-features in FE, but settled on block-view)

The progressive loading provides a number of parallel views of a block, with different levels of detail : . summary (e.g. feature count per bin - gene count or coverage) . % of paths . % of features (paginated / interleaved) Each of these can contain a number of results; e.g. from lazy-loading additional pages of features, or zooming in resulting in a changed resolution threshold, or the user adjusting their zoom threshold factor slider. It is likely that a user will revisit sections of a block, so retaining this information will improve speed of re-viewing it. Information from multiple results will be combined as appropriate; the results will be displayed at the zoom levels appropriate for that result, possibly displaying just a % of a result when zooming out.

The front-end may hold these results in a component related to the block, call it a block-view as an interim name. These addons may be useful in managing this : https://github.com/ebryn/ember-model https://github.com/amiel/ember-data-url-templates

These addons are related to incremental rendering; they are intended for html rendering and may not be practical for the d3 rendering. https://github.com/runspired/smoke-and-mirrors https://github.com/jasonmit/virtual-each https://github.com/emberjs/ember-collection https://github.com/adopted-ember-addons/ember-impagination

Don-Isdale commented 5 years ago

Summary histogram of feature counts : . is probably not required for genetic maps . whether to display it depends on the number of features in the block : a genetic map will generally have 10s - 100s features, so not much value in a summary - send all the features instead. The BE knows the feature count, and can make this decision. This can mean variant result type to the request for initial view of a block.

The initial display of summary histogram can be switched off automatically when sufficient feature information has been received and displayed (not clear yet what heuristic to gauge this by). i.e. for each block, show either histogram or features.

display of histogram is related to the split axes, and there may be work required before that can be used for primary display. So we'll start work on step 2 (request a % of paths) first; This involves : . use mongo query for direct paths and aliases (not paths by reference at this stage) . cache the result in BE and extract %pages out of it, for progressive responses . display groups of paths as received in FE The paths result contains just feature name and id; we can add location (value). It would be useful to be able to brush the feature information extracted from the progressive paths result.

Incidentally, this is a fairly different visualisation, but the progressive enhancement principles thought through here are quite related to this application : http://datashader.org/topics/nyc_taxi.html

Don-Isdale commented 5 years ago

Some examples of using mongoDb aggregation pipelines from node.js: 1, 2, 3

An example of saving the result of an aggregation to use in multiple later pipelines. Note also $facet, but that doesn't apply when the following aggregations occur at a later time.

Pagination methods in MongoDb.

The commit feature/progressive 0b0041b contains 2 added mongo scripts : . Match features by name between the 2 given blocks. The result is the alignment, for drawing paths between blocks. . Count Features within evenly sized bins (buckets) on the given block.

kieranongh commented 5 years ago

Just some comments and references on caching.

3 methods for caching considered:

In memory cache (memory-cache)
File cache (flat-cache)
Caching service (memcached)

Opted for an in memory cache due to speed and simplicity, can revise if needed

Memory cache is implemented using a middleware approach, which required finding out how to work with, instead of against Loopback. Examples on using memory-cache and plugging it in as middleware were all for Express, getting it to work with Loopback requires learning Loopback's middleware strategy here

Comparing API call times with and without a cache also involved adding middleware, which Loopback has an example of how to do https://loopback.io/doc/en/lb3/Defining-middleware.html#pre-processing-middleware

A quick summary of the approach I took is to define the middleware function, add it to backend/server/middleware/.js, register the middleware in backend/server/middleware.json and edit the 'paths' attribute for only the paths required

kieranongh commented 5 years ago

Also worth noting that the example I found using memory-cache used the url as the key, which I've implemented but may not be the best approach for our use case

Example: https://scotch.io/tutorials/how-to-optimize-node-requests-with-simple-caching-strategies

Don-Isdale commented 5 years ago

Remaining sub-tasks moved to https://github.com/plantinformatics/pretzel/pull/150.

gabrielkg commented 4 years ago

During discussion some future features were sketched out :

brush : snap to the features outside or inside the brushed region
dataset explorer : for each block, show # paths with the currently loaded blocks, e.g. [500, 1000]; hover could show more detail e.g. block names
block feature count curve, beside the axis (without splitting), per block, use colour matching block title, counts can vary widely between blocks so can be different scales - autoscale and show the count on hover

plantinformatics / pretzel

Progressive enhancement / loading #127

dev (~ 2018Sep18)