malonge / RagTag

Tools for fast and flexible genome assembly scaffolding and improvement
MIT License
460 stars 47 forks source link

RagTag scaffold to detect mobile genetic element movement #76

Closed SamuelGreenrod closed 3 years ago

SamuelGreenrod commented 3 years ago

This isn't really an issue but more of a question about the uses of the RagTag scaffold function.

I want to monitor prophage movement in clonal isolates, but only have access to short read data and one well assembled reference. So far I have assembled my isolates into draft assemblies. I have then scaffolded the assembly contigs to the reference generating aligned clonal assemblies.

If I detected prophages in each of the assemblies, would I be able to determine whether the prophages have moved positions based on the genome coordinates? Or will the contigs with prophages always align to the same region in the reference, effectively showing no prophage movement? The contigs the prophages are in are 100kb and 30kb long, and the two prophages in them are 35kb and 13kb respectively.

What determines where a contig aligns to the reference when part of it doesn't align? I was hoping that because the prophages are less than half the size of the contig, the contigs will align normally to the reference. Therefore, if a prophage has moved, I should be able to detect it by the prophage having different genome coordinates between scaffolded assemblies.

If you need any clarification please let me know but hopefully this makes sense. Thanks!

malonge commented 3 years ago

Hi there,

I hope to have a paper out soon that describes this in more detail because the method has changed a little since RaGOO.

To summarize, for each query sequence, alignments that are close together and on the same strand are merged together. After merging, the longest alignment is chosen as the representative alignment. To scaffold the sequences, those representative alignments are sorted. So without more info, it is impossible to say if the prohpage sequence alignment will be the representative alignment for a given query sequence. Obviously, the best solution is to get more contiguous contigs, but I realize you are working with the data you have.

One thing that can help you is to use the --debug option. Then, you can check the ragtag.scaffold.debug.merged.paf file to see the merged alignments used for scaffolding.

Hope this help!

SamuelGreenrod commented 3 years ago

Hi there,

That's great, thank you for the help. From this I think the scaffolded assemblies are likely not the best approach to look at MGE movement but it's great to get this cleared up. I'll check out the --debug option. Cheers!