Allow eliding node sequences when reads are present - Githubissues

vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map

MIT License

177 stars 24 forks source link

Allow eliding node sequences when reads are present #422

Closed adamnovak closed 3 months ago

adamnovak commented 4 months ago

We would like to be able to use node-sequence-removal simplification on Lancet graphs to see how tumor and normal reads flow through the graph structure.

adamnovak commented 4 months ago

This would be especially useful for the third Lancet example that's in the repo right now, with the mostly-specific deletion.

adamnovak commented 3 months ago

We aren't going to add support for displaying reads when snarls have been simplified out with vg simplify, because vg simplify neither adjusts reads nor records any sort of translation information about what it did to the graph.

Since we'll have one simplification mode allowed with reads and another not, we'll have to make the simplification menu always available and grey out the snarl simplification switch inside it when reads are present.

adamnovak commented 3 months ago

We want the indels and mismatches to not be drawn in this mode.
We want the display to act as if the empty nodes have one base, so we get a column that reads can be in, so we can draw internal reads and the internal parts of entering and exiting reads ans they can figure out how to not overlap each other with the existing code.

adamnovak commented 3 months ago

We can probably make a new nodeWidthOption value for a constant node width, and always use that when sequences are removed, and the display will magically look right, because we already implemented e.g. logarithmic node widths.

adamnovak commented 3 months ago

The existing compressed view checkbox should just do nothing if sequences are removed, and maybe should vanish or be greyed out in that case. The node width calculation mode would be locked to fixed width.

adamnovak commented 3 months ago

So today at pair programming @shreyasun and I got this to the point where we have a "fixed" nodeWidthOption that gets used when viewing data where the node sequences were not sent form the server. But when you use this to draw the reads, they don't actually draw on the nodes.

We also looked at read drawing in "compressed" nodeWidthOption mode, where the read start and end positions get logged and then the read gets drawn. We can't just use "compressed" mode when the read sequences aren't sent because the server doesn't set a sequenceLength field on the nodes.

So I think the two possible approaches are:

Make the server set sequenceLength on the nodes when it removes their sequences and then use compressed (or maybe small?) mode to display them and their reads when the node sequences are not available.
Adjust how we draw reads so that, in fixed mode, we draw them just covering the whole of any node they visit. We might need to touch getReadXStart() and getReadXEnd() to, in that case, not use getXCoordinateOfBaseWithinNode() and instead directly use nodePixelCoordinatesInX() to figure out where to draw the read within the node. The read Y positioning and node height calculation is based on the assumption that reads that don't overlap in base coordinates won't overlap visually, so we might need to change placeReadSet() to handle fixed mode differently and not lay out any reads in the same vertical position of the same node, in that case.

I think setting and sending sequenceLength might actually be the simplest approach here, because it requires the least changes to the fiddly read layout code.

adamnovak commented 3 months ago

To have the server send lengths instead of sequences, we could touch here: https://github.com/vgteam/sequenceTubeMap/blob/5b1a3f5c461a056972076c578a885d7cd64db8e9/src/server.mjs#L423C5-L423C24

And we could add a .sequenceLength to each node with the length of its sequence, before removing the sequence itself. I think this will propagate all the way through to where we consult sequenceLength, automatically.

adamnovak commented 3 months ago

Using an undefined .sequence would make more sense than an empty-string .sequence and a nonzero .sequenceLength, but @shreyasun says she tried an undefined .sequence and this threw off other existing code.