pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
194 stars 39 forks source link

Overeager link-adding in `odgi flip` #496

Open anshumanmohan opened 1 year ago

anshumanmohan commented 1 year ago

I think flip is adding new links too eagerly.

Minimal

Here's a minimal example:

H   VN:Z:1.0
S   1   A
S   2   T
S   3   G
P   x   1+,2+,3+    *
L   1   +   2   +   0M

Clearly the path does not require a flip. More subtly, though, the graph is not well-formed: it will fail validate. Running flip and then view has an interesting effect:

H   VN:Z:1.0
S   1   A
S   2   T
S   3   G
P   x   1+,2+,3+    *
L   1   +   2   +   0M
L   2   +   3   +   0M

Now this graph is valid! Interesting, but IMO not flip's job!

Why this exists

I'll create a valid graph that is in need of a flip:

H   VN:Z:1.0
S   1   A
S   2   TTT
S   3   G
P   x   1+,2-,3+    *
L   1   +   2   +   0M
L   2   +   3   +   0M

And now flip and then view shows a reasonable output: a path was flipped, and two links were added in support of the new path.

H   VN:Z:1.0
S   1   A
S   2   TTT
S   3   G
P   x_inv   3-,2+,1-    *
L   1   +   2   +   0M
L   2   +   1   -   0M
L   2   +   3   +   0M
L   3   -   2   +   0M

Fix

The paths that have just been generated need to be siloed off, and only links that are needed by those paths should be added.

Something else is also going on

I do still think that flip is doing something else that's a bit fishy. See note5.gfa from your test suite. I don't think that fixing the above would fix flip's behavior when run against note5. See my comment here: https://github.com/pangenome/odgi/pull/485#issuecomment-1496693803

sampsyo commented 1 year ago

Interesting! Can I ask some follow-up questions to ensure that I'm understanding what's interesting about these examples?

anshumanmohan commented 1 year ago

Thanks for this, Adrian!

tl;dr: I think leaving it as-is would be fine, I just worry about downstream reliances on this "fixing" behavior.