lbcb-sci / raven

De novo genome assembler for long uncorrected reads
MIT License
201 stars 21 forks source link

Autopolyploid genome recommendation #14

Open harish0201 opened 4 years ago

harish0201 commented 4 years ago

Hi,

I'm working on an auto-tetraploid plant with a genome size of 1.7Gb. For testing purposes I had thrown all the data that I had on default settings with Raven.

The assembly was around 340Mb which is closer to the haploid length estimate. Do you think Raven can be used to assemble it or any parameters that can be changed to accommodate the ploidy?

The genomescope for it is here: http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=1RdMr7FUNtBmC0zbxQTX

rvaser commented 4 years ago

Hello, there are no changeable parameters in Raven at this point (only alignment parameters for Racon consensus). Is the assembly fragmented? Did you evaluate it somehow or compared it to other assemblers?

Sorry for the delayed response! Best regards, Robert

harish0201 commented 4 years ago

Hi,

Apologies for the really late response!

I wouldn't say that the assembly is fragmented, given that we are still exploring as to how to best assemble the data. However, what we do know is the ploidy, repetitiveness and the genome-size (autotetraploid, 90% and ~1.7Gb).

Canu/Falcon have been running for some time (nearly a month) as these aren't tuned towards such complex genomes on AWS. However, I do have WTDBG2's results.

WTDBG2 has given me a near diploid assembly (~750-800Mb) in about 11000 contigs as well with ~250Kb as N50; RAVEN has given me 340Mb in 6000 contigs with an N50 of 66Kb.

However, RAVEN is bloody fast! So I was planning to use it :)