mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

Help/advice trouble shooting a couple of problematic samples that ragout is tripping over #83

Closed DaRinker closed 1 year ago

DaRinker commented 1 year ago

I have 20 samples of a species that has no (good) reference assembly, however there is a very good T2T assembly of a sister species.

I'm refining my assemblies against the T2T assembly and I have at least two samples that will not scaffold properly (looks like one chromosome is consistently getting "unused").

The ragout warnings I'm getting for these problematic samples are:

[22:43:12] INFO: Refining with assembly graph
[22:43:12] WARNING: Too few overlaps (2) between contigs were detected -- refine procedure will be useless. Possible reasons:

1. Some contigs output by assembler are missing
2. Contigs overlap not on a constant value (like k-mer for assemblers which use debruijn graph)
3. Contigs ends are trimmed/postprocessed

I assembled all these genomes from ONT plus Illumina data (flye plus pilon). All assemblies appear to be BUSCO complete and the average coverage of each contig is good (45x), so I think they're "good" in general..

For scaffolding, I've tried varying assembly parameters/polishing in flye/pilon, and I also tried different ragout recipes (e.g. with and without phylogenetic information, and by scaffolding against 1, 2 ,or even 3 reference assemblies). In every case, I have the same problem reported above.

I'm now wondering if there might be something about these few samples that is fundamentally weird/wrong (either biological or technical). Are there any less-obvious things I should see if I can fix, or does it sound like I should I just move forward with what I have?

DaRinker commented 1 year ago

I finally found an order of assembling-->scaffolding-->re-scaffolding that fixed my major issue of the discarded chromosome (so the answer to my question in general seems to be to play around with the reference sequence (and that sometimes more reference sequences aren't better!)