ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
510 stars 94 forks source link

how to adjusting parameters to improve the assembly result #218

Open linshengnan2020 opened 4 years ago

linshengnan2020 commented 4 years ago

hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 32k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!

shanesturrock commented 3 years ago

I'm dealing with a much larger genome (26Gbp) but with similar levels of repeats. It may seem counter intuitive but increasing the required overlap from the default of 2Kbp to 5kbp (using the -l flag) has helped my assemblies. I noticed when I mapped the raw reads back onto the raw.fa that there were a number of locations where the assembly was collapsing around repeats but by increasing the minimum overlap I was able to get rid of a lot of these and improve the overall length of the assembly at the cost of increasing the number of contigs. However, I'm going to scaffold at a later stage once I'm done with error correction and polishing so I should be able to improve things again then. Better to have more contigs without repeat regions being collapsed.

ruanjue commented 3 years ago

The solution may includes -l, '-R -s' and '--aln-dovtail -1'.

shanesturrock commented 3 years ago

I've been using -p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000 -l 5000 but I'm still tweaking and testing. The good thing is the turnaround time is really short due to how fast the program is so I can try different settings and investigate the effects.

cement-head commented 3 years ago

Is there a specific parameter that needs to be adjusted and/or input to wtdbg2 to specify the coverage depth? Or is that irrelevant for the programme to run correctly?

ruanjue commented 3 years ago

Have a look at wtdbg2 --help, there are two relative options, --limit-input and -X.

lifan18 commented 3 years ago

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

lifan18 commented 3 years ago

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

Hi Prof. Ruan,

I tried to use the 4 parameters together, but I got a more bad result than I did not add up -l -R -s --aln-dovetail -1. Is any problem to add up the 4 parameters at the same time?

-t 96 -fo Species -l -R -s --tidy-reads 5000 --edge-min 3 --rescue-low-cov-edges --aln-dovetail -1

Hope ur reply.

Thank you very much!

Li Fan

ruanjue commented 3 years ago

-R works at the step of generating alignments, --aln-dovetail works at the step of filtering alignments, and -s wokrs at both steps. So, you can use a loose -s together with -R at the first run, then --load-alignemnts and tune a better results with different parameters.