Open linshengnan2020 opened 4 years ago
I'm dealing with a much larger genome (26Gbp) but with similar levels of repeats. It may seem counter intuitive but increasing the required overlap from the default of 2Kbp to 5kbp (using the -l flag) has helped my assemblies. I noticed when I mapped the raw reads back onto the raw.fa that there were a number of locations where the assembly was collapsing around repeats but by increasing the minimum overlap I was able to get rid of a lot of these and improve the overall length of the assembly at the cost of increasing the number of contigs. However, I'm going to scaffold at a later stage once I'm done with error correction and polishing so I should be able to improve things again then. Better to have more contigs without repeat regions being collapsed.
The solution may includes -l
, '-R -s' and '--aln-dovtail -1'.
I've been using -p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000 -l 5000
but I'm still tweaking and testing. The good thing is the turnaround time is really short due to how fast the program is so I can try different settings and investigate the effects.
Is there a specific parameter that needs to be adjusted and/or input to wtdbg2 to specify the coverage depth? Or is that irrelevant for the programme to run correctly?
Have a look at wtdbg2 --help
, there are two relative options, --limit-input
and -X
.
Hi Prof. Ruan,
I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.
I am trying to assembly it again. Hope it works.
Thank you!
Hi Prof. Ruan,
I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.
I am trying to assembly it again. Hope it works.
Thank you!
Hi Prof. Ruan,
I tried to use the 4 parameters together, but I got a more bad result than I did not add up -l -R -s --aln-dovetail -1
. Is any problem to add up the 4 parameters at the same time?
-t 96 -fo Species -l -R -s --tidy-reads 5000 --edge-min 3 --rescue-low-cov-edges --aln-dovetail -1
Hope ur reply.
Thank you very much!
Li Fan
-R
works at the step of generating alignments, --aln-dovetail
works at the step of filtering alignments, and -s
wokrs at both steps. So, you can use a loose -s
together with -R
at the first run, then --load-alignemnts
and tune a better results with different parameters.
hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 32k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!