pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

validate segfaults if path contains a non-existant node #587

Closed ASLeonard closed 2 months ago

ASLeonard commented 3 months ago

This is likely an obscure corner case that is hard to protect against, but may be useful. I'm not sure if it is the responsibility of odgi to gracefully tell people their graphs are malformed rather than crashing, but there is at least a partial issue here.

Running odgi validate on a gfa where the path includes a non-existant node causes a segfault.

S       1       A
S       2       C
S       3       T
S       4       G
L       1       +       2       +       0M
L       1       +       3       +       0M
L       2       +       4       +       0M
P       foo     1+,2+,4+        0M
P       bar     1+,5+   0M

Arguably, the description of the subcommand should tell the user the paths are not consistent with the graph (since no path can contain node 5 since node 5 doesn't exist). This currently only checks for consistency with L-lines assuming L-lines are already consistent with S-lines.

Interestingly, a gfa with a link line containing a non-existant node raises a "Floating point exception"

S       1       A
S       2       C
S       3       T
S       4       G
L       1       +       2       +       0M
L       1       +       3       +       0M
L       2       +       4       +       0M
L       2       +       5       +       0M
P       foo     1+,2+,4+        0M
P       bar     1+,3+   0M

This can be guarded against with some checks in src/gfa_to_handle.cpp, namely for edges here

https://github.com/pangenome/odgi/blob/2151dd8d73d382e8136509a487e0628a0ca0b62f/src/gfa_to_handle.cpp#L89-L91

and presumably for paths here

https://github.com/pangenome/odgi/blob/2151dd8d73d382e8136509a487e0628a0ca0b62f/src/gfa_to_handle.cpp#L127-L129

Unsurprisingly since the issue originates in the gfa loading, almost all subcommands segfault if the path contains a non-existant node.

Best, Alex

AndreaGuarracino commented 2 months ago

@ASLeonard, thank you for breaking ODGI xD I've put a few polite checks in #591. Could you try to break it again?

ASLeonard commented 2 months ago

The changes handle all the bad files I had. And they print a very useful message for the error as well!