maickrau / GraphAligner

MIT License
256 stars 30 forks source link

overlap longer than nodes error #43

Closed ptrebert closed 3 years ago

ptrebert commented 3 years ago

Hi Mikko, I just stumbled upon this again (v1.0.13 from conda) and wanted to investigate the size of the problem in the input graph when I encountered the following:

Why does GA think the overlap is too long?

+Peter

maickrau commented 3 years ago

Hi, could you upload the graph?

ptrebert commented 3 years ago

Yes, of course - would sharing the graph via Globus work for you?

One more observation: I just extracted the two nodes and the edge, and used only that as input for GA (same reads); this time, the error was not triggered (I get lots of failed Assertions instead, but I assume that is a result of the tiny graph size)

maickrau commented 3 years ago

Yes, Globus works

maickrau commented 3 years ago

The graph has two edges with different overlaps between the two nodes:

L       utg010491l      -       utg042014l      +       18478M  L1:i:316344
L       utg042014l      -       utg010491l      +       18472M  L1:i:3

and one of them is longer than utg042014l

ptrebert commented 3 years ago

ah, I missed the second one - any chance GraphAligner could print the actual edge entry in the error message?

ptrebert commented 3 years ago

@maickrau Can you make explicit what the intended behavior is: apparently, GraphAligner checks for an overlap difference "> 0" between edge and node, i.e. edges that have the same length as the shorter node (overlap difference = 0) are considered invalid and that is as intended?

maickrau commented 3 years ago

Yes, edges where the overlap is as long or longer than the shorter node are considered invalid. Only edges where the overlap is shorter than both nodes are valid.

ptrebert commented 3 years ago

Ok, thanks for confirming, I'll adapted my GFA cleanup script accordingly.