mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
158 stars 28 forks source link

Synteny breaks over the origin of replication/end of sequence #22

Closed bastian-wur closed 7 years ago

bastian-wur commented 7 years ago

Hi everyone,

I just tried ragout, and it seems to work nicely, I'm pretty satisfied with it :). But while I'm messing around with it, I see there's one issue...well, not issue, since it's pretty obvious how and why it happens....er.... Okay, let's say you have 2 reference genomes. One is 4 mb, circular. The other one is also circular, 4 mb, but the sequence has been split in the middle, because the assembler started assembling it at a different point. So the bases 0-2 Mil of genome no. 1 map to bases 2-4 Mil of genome no. 2, and bases 2-4 Mil of genome no. 1 map to bases 0-2 Mil of genome no. 2. Obviously, 2 synteny blocks will be inferred. And following that, the genome you want to scaffold will end up in 2 scaffolds, even if it matches perfectly. While this makes sense, I guess this could be improved to only yield 1 scaffold at the end.

Not a big issue, you just need to pay attention how you pick your reference genomes, but this could be improved.

bastian-wur commented 7 years ago

...oooh...sorry... I was changing my references a bit, and it seemed that at least one wasn't that good. Now it scaffolds over the end/beginning of the circle :).