Open biork opened 1 year ago
The comment on #1594 would apply here as well. Overall there are too many changes to igv.js here to accommodate a track and file convention without a user community. Again perhaps this illustrates the need for a "contrib" plugin capability. In this case you would need to supply the track and a parser as you are in effect creating a new file format. So its perhaps more difficult than #1594 .
One meta comment, igv.js already has a heatmap track and format, "seg", for segmented copy number. Its possible this track and format ("seg" is a widley used standard format for copy number) would make a better basis than a bed track, with less special cases.
A proposal for a multi-scale Heatmap Track
Motivation
This track type was motivated by multiscale genomic analyses such as https://pubmed.ncbi.nlm.nih.gov/24727652/
A heatmap typically maps continuous data to one of two types of color palettes, depending on the distribution of the data to be visualized:
Implementation Overview
My current implementation tries to balance:
Implementation rationale
Because a heatmap can be thought of as nothing more than multiple rows of densely-packed annotation features (typically without any additional decoration like labels), the FeatureTrack that supports general annotations already has all facilities required to support heatmaps. In particular,
These changes have almost no impact on performance, and what little performance impact there is could be mitigated with slightly more invasive changes.
To keep the IGV implementation as simple (and fast) as possible, the data is expected to be fully preprocessed for display; all IGV does at runtime is map numeric values to colors using a colormap function and a palette.
Concretely, data should be in [0,1]. Values below and above this range are by default clamped to 0 and 1 respectively and thus mapped to the palette's edge colors. Also, two discrete "outlier" colors can optionally be provided in the track config to highlight outlying data instead of just using the palette's "edges" (a very good idea I first saw in matplotlib).
Data is delivered in BED files
Given the preceding characterization of heatmaps, it is natural to deliver heatmap data as BED files with a very minor abuse of the format: the 4th (name) column contains a 0-based row assignment. The *name field in BED files can be thought of as naming the scale of the data (corresponding to a row). Since genome coordinate ranges in heatmap data would not typically be associated with other names, this is not such an abuse of the BED format. The 5th (score) column is used for it's intended purpose: a score.
This arrangement also allows additional runtime optimizations:
These data preparation optimizations are, or course, optional but advisable in the interest of performance.
New files
Only one new JavaScript file, multiscalehm.js, is added providing:
Only the renderCell function is necessary. The colormap function and palettes could be made the user's responsibility to be defined in the config, but as a suitable palette and colormap function is always necessary and a linear map is most common, providing these as defaults reduces work for user. Importantly both can still be defined entirely in the config, maximizing generality.
IGV code changes
With the above considerations only a few edits to IGV were necessary:
User requirements
The following should be set in the track config:
Defaults are provided for everything that insure something is displayed, though it will certainly not be ideal without user configuration, and it won't even be correct if maxRows is unset.
As is, the implementation simply make full use of configured space, so heatmap lines are config.height / config.maxRows pixels. In particular, squishedRowHeight and related config variables are ignored, and no runtime adjustment of track height should occur.
The maxRows config element could be made optional since largest row index can be inferred at runtime, but requiring specification of maxRows simplifies the implementation (being known before data is parsed). May also want to use scaleCount as a more meaningful alias.
Input must come from BED files with:
As with my previous stacked bar graph, I'll submit a pull request if this is of interest to the group. Thanks, roger kramer, bioinformatician University of Eastern Finland