mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
146 stars 27 forks source link

How to check validity of results #86

Open Alexkortsi opened 1 year ago

Alexkortsi commented 1 year ago

Hi,

i have ONT and Illumina PE data for a fungal genome with a size of 40Mbp. I have tried several methods to reach to a final assembly. My final assembly with Flye has 30 contigs (i would like to complete the assembly to the 7 chromosomes of this organism). My questions are:

1) I used ragout with a reference genome (which is assembled in chromosomes) of the Same species but different strain. What would be the next step to validate the correctness of the 7 chromosomes assembled for my strain? Given that a certain degree of rearrangements must have happened, is there any way that i have lost any important information by this method? Is there any way i can manually check results?

2) Regarding the unplaced contigs, how should i handle these reads if they are not placed within the 7 chromosomes created by ragout?

thanks a lot for your time and help!!!

mikolmogorov commented 1 year ago

Hi,

Wrt to validation, they are usually tailored for particular projects. Your final assembly is based on the information from long reads and the reference genome, so ideally you'd need some kind of orthogonal data to validate, but this of course is not always possible. You may look into various assembly metrics computed by QUAST, or methods used in the recent genome assembly papers (e.g. https://www.nature.com/articles/s41586-022-05325-5).

For unplaced contigs, you may look into tools for gap filling, but they likely won't be able to help much. The completeness and quality of the original long-read assembly is the main limitation.