vector-engineering / covidcg

A COVID-19 CoV Genetics (CG) browser to inform therapeutics development
https://covidcg.org
MIT License
26 stars 5 forks source link

NT mutations across multiple segments #562

Open atc3 opened 2 years ago

atc3 commented 2 years ago

The custom coordinates mode is currently being adapted in the Flu version to accommodate segment information (this is not relevant in the SARS2 and RSV sites where only one segment is present). While querying for and parsing mutations from multiple segments is not an issue (and has been tested), there are a couple of problems (listed below). To avoid these problems currently, we are restricting mutations to cover at most one segment (multiple segments not allowed), and to support more than one segment would require some fixes:

  1. The entropy plot (mutation frequency plot) only supports one linear stretch of mutations. If querying for mutations along multiple segments, i.e., positions 1–100 on segment 1 and 1–100 on segment 2, then mutations from both segments will be overlaid onto the same plot, resulting in a misleading graphic. Solutions for this:

    • Make multiple entropy plots and arrange them horizontally, one plot for each segment. This is possible but would require some Vega trickery plus smart horizontal scaling
    • Make multiple entropy plots, but arrange them vertically. Pass as props into the Entropy plot which segment that plot covers, and inside each Entropy plot component, filter for mutations for that segment (preferred solution)
  2. Coverage data is currently designed for one continuous linear segment. Querying over multiple segments produces incorrect coverage data. Segment-aware coverage would require a rewrite of the coverage logic and the addition of a segment field into the returned coverage data -- either separate coverage data arrays for each segment, or a segment identifier within each coverage entry. The frontend would then need to parse this and each visualization would need changes to parse this new form of data