jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

Weird bug with "sga assemble": sga: Bigraph.cpp:190: void Bigraph::merge(Vertex, Edge): Assertion `pTransEdge->getDir() == pEdge->getDir()' failed #101

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hello,

I'm trying to use the "sga assemble" tool as a component of a new tool I'm developing to assemble conserved markers in metagenomics data. I'm generating an overlap graph by my own, and I output it to a ASQG format in order to assemble it with SGA.

I already tested "sga assemble" with test asqg files and I managed to get it working ok. However, I'm encountering a weird bug now that I didn't manage to debug.

I designed a minimal input file test .asqg.txt that contain the problem with 2 overlaps and 1639 reads, but i cant find anything wrong with it. Did I miss something ?

$ /home/localspace/pericard/programmes/sga/src/SGA/sga assemble -v -o test -m 30 -d 0.03 -l 50 --max-edges 1000 test.asqg Deleted edges for 0 super repetitive vertices Warning: removed 0 duplicate edges Vertices: 1639 Edges: 4 Islands: 1636 Tips: 2 Monobranch: 0 Dibranch: 0 Simple: 3 num verts: 1639 using 191740 bytes (116.99 per vert) num edges: 4 using 128 bytes (32.00 per edge) total: 191868 [Stats] Input graph: Vertices: 1639 Edges: 4 Islands: 1636 Tips: 2 Monobranch: 0 Dibranch: 0 Simple: 3 Removing contained vertices from graph [Stats] After removing contained vertices: Vertices: 1639 Edges: 4 Islands: 1636 Tips: 2 Monobranch: 0 Dibranch: 0 Simple: 3 sga: Bigraph.cpp:190: void Bigraph::merge(Vertex, Edge): Assertion `pTransEdge->getDir() == pEdge->getDir()' failed. Abandon

Furthermore, when I remove a few vertices from this file, whichever they are, the bug seems to disappear and the program runs to the end without any pb.

Can you help me understand this bug ? Is it a SGA bug or is there something wrong with my ASQG file ? Could it be related to this issue (https://github.com/jts/sga/issues/100) I found when I tried to debug ?

Thanks a lot in advance

jts commented 8 years ago

This is indeed very strange and I can reproduce your problem (including the bug going away when vertices are removed). It is not related to the valgrind warning though. I'm going to look into it today but I am leaving on holidays very soon so I'm not sure if I'll be able to fix it immediately.

jts commented 8 years ago

A bit of progress: removing vertices changes the order in which the vertices are processed (due to how the hash table is laid out in memory). When vertex 71 is merged first the assertion always occurs, regardless of how many vertices are in the file. Can you double-check that your ASQG file is formatted correctly (eg the overlaps are correct)?

ghost commented 8 years ago

Ok. I found the pb, and the mistake is all mine... I was assuming that in an overlap, if the second read was reversed complemented, the positions that had to be given in the edge description where on the reverse-complement sequence. I was wrong. The position of the overlap of both reads have to be given on the sense sequence and then set the reverse tag to 1. I was thinking of a bug because of the weird behaviour with the vertices in and out, but the pb arose because of my misunderstanding of the ASQG format. Thank you a lot for helping me, and I hope to be citing your great work soon enough ;-)

test_small.asqg.txt