Our interval annotation track was designed to work with a limited set of BED files generated using chromHMM. The default layout includes a hard-coded legend and color scheme that pair fixed lists of state IDs, colors, and text labels together. This is based on the portaldev API, whose starting point is a file with lines such as:
The current plot definition uses 13 known states. Other chromHMM papers report using 15, 18, or 25 state models with some (but not complete) overlap. An open question is whether the portal harmonizes these (so that a preset standardized list of names is desirable) or not.
The order of tracks seems important. By default they are plotted in reverse order of state ID. Every state ID has a unique label, within that same given dataset.
Are state IDs harmonized across datasets (eg, "state 1" always unique to the 15 state model)? Or are they keyed to whatever state_id value was used in a particular file? (eg, "state 1" only has meaning in context of the source data)
Should users expect that coloring and ordering are stable as the user scrolls to a new region? (fixed palette vs whatever colors are available)
For common tools, are the color schemes highly standardized?
Other notes
Evaluated density of the tracks to consider how LZ could handle larger region sizes:
chromHMM divides regions into 200 bp windows (which would be 5k points in a 1Mb region). But since many states span across multiple windows, and a BED track reports the final interval, then the actual number of elements to draw is potentially several orders of magnitudes less. (eg in one case, 308 rectangles per 600 kb.)
Current visualizations attempt to ensure that tracks are displayed in a mostly stable order, based on numerical state IDs.
TODO
[x] Since the layer is tied to a particular Portal use case and data format, break intervals layer out into an extension (and tell people to require it on usage)
[x] Add a new color_rgb scale function, specific to LZ Intervals extension
[x] Improve generic coloring mechanisms
[x] Also move some additional features to extensions (like covariates model dashboard) that might not be actively used in the portal
Purpose
Our interval annotation track was designed to work with a limited set of BED files generated using chromHMM. The default layout includes a hard-coded legend and color scheme that pair fixed lists of state IDs, colors, and text labels together. This is based on the portaldev API, whose starting point is a file with lines such as:
chr1 10000 10600 15_Repetitive/CNV 0 . 10000 10600 245,245,245
Design questions
Auto-detect states, or presets?
The current plot definition uses 13 known states. Other chromHMM papers report using 15, 18, or 25 state models with some (but not complete) overlap. An open question is whether the portal harmonizes these (so that a preset standardized list of names is desirable) or not.
The order of tracks seems important. By default they are plotted in reverse order of state ID. Every state ID has a unique label, within that same given dataset.
Are state IDs harmonized across datasets (eg, "state 1" always unique to the 15 state model)? Or are they keyed to whatever state_id value was used in a particular file? (eg, "state 1" only has meaning in context of the source data)
Should users expect that coloring and ordering are stable as the user scrolls to a new region? (fixed palette vs whatever colors are available)
For common tools, are the color schemes highly standardized?
Other notes
Evaluated density of the tracks to consider how LZ could handle larger region sizes: chromHMM divides regions into 200 bp windows (which would be 5k points in a 1Mb region). But since many states span across multiple windows, and a BED track reports the final interval, then the actual number of elements to draw is potentially several orders of magnitudes less. (eg in one case, 308 rectangles per 600 kb.)
Current visualizations attempt to ensure that tracks are displayed in a mostly stable order, based on numerical state IDs.
TODO
color_rgb
scale function, specific to LZ Intervals extension