mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

slight confusion on inputs for cactus when then using in ragout #80

Closed benyoung93 closed 1 year ago

benyoung93 commented 1 year ago

Good afternoon :).

I have been pouring over the documentation for ragout and cactus so to fully understand the inputs needed to make everything work. I have but one quick question which I am still a little unsure about, and was hoping it would be a quick and easy one for any/all to answer. I was also 50/50 about whetehr to have this query in cactus or ragout, and I deceided on ragout.

What I have - preliminary genome assembled form hifiasm. 5 x reference genomes at chromosomal level. Out group chromosomal genome (as I read this is needed in the cactus Newick tree).

My query For cactus - should this be my 5 x reference genomes, my out group genome AND my preliminary assembly? Or should this only be the reference genomes and outgroup genome used in the cactus step? I get that obviously the preliminary assembly will need to be in ragout, but whether I should include it in the cactus step is where I am currently puzzled.

Again, I apologize for this pretty basic question, but I could not find an obvious straight up answer. These tools are super cool and will hopefully allow me to get to the chromosomal level for my non model organism :) 

Ben

mikolmogorov commented 1 year ago

Hi Ben,

Sorry for the late response! All genomes that you intent to use for reference assmebly should be provided as input for cactus. That includes the target genome. The easiest way would be to start with 1-2 closest reference genomes. If you have reference genomes of the same species (for example), you likely won't need an outgroup.

Hope this helps, let me know if you have more questions. Misha

benyoung93 commented 1 year ago

Hi @fenderglass

First of all apologies was on holiday for the last 2 weeks :).

Thats really helpful thank you very much. I did get it all to run successfully but it did not work due to the quality of closely related genomes to my species (i say closely, they are not that closely related but it was all I had) and the old genome of the species that I have sequenced is not the best.

For reference, this is a stony coral species (Orbicella faveolata) that I have sequenced to generate a higher quality genome.

Ben

mikolmogorov commented 1 year ago

Sounds good!Definitely, quality of the reference genome is important. I'd rather go with high-quality but more distant, than closely-related but fragmented / misassembled / incomplete.