Open shri1984 opened 3 years ago
The first step is to check whether the polished contigs are more accurate using WGS short reads.
I did that. I use polca.sh for polshing. I use 1 billion illumina PE reads (150 PE). this is the report Substitution Errors: 991295 Insertion/Deletion Errors: 663392 Assembly Size: 5498989353 Consensus Quality: 99.9699
So, the reason should be many repeats were collapsed in assembling, not the problem of wtpoa-cns. One option is to add -R
to wtdbg2
, which will be 2X slower. Another opition, try to use flye
or other assembler on this dataset, and find the best assembly.
Thanks. I see. I used the options you suggested (-R, aln-dovetail -1 or 1024, -l 500 etc, K 2000) for repetitive genomes (in issue #230). It worked beautifully, but things go wrong in cns stage. is there any other wtpoa-cns like the consensus calling tool I can try and compare?
In wtdbg2 step, the assembly size was stated by uncorrected seqeunce length, usually will become smaller after wtpoa-cns.
Do you know what is acceptable limit for this reduction? in my case it is 12 %. data is coming from 7 cells of sequel CLR. I am also using RS preset. it started to become good with this preset. Again I got this info from other issues you addressed here. so you think I have no way out of this problem?
If the genome size was correctly estimated and the genome was complicated, maybe there is no way. However, please find out some contigs that differed much in size between before and after polishing, then align their CLR long reads to their consensus sequences to see whether there were big insertion/deletions. If found many such cases, there should be errors when wtpoa-cns concatenates cns seq pieces.
Hi Thank you for providing such excellent tools. We rely on it to assemble the genome using ccs data. At present, for our data, its result is obviously better than hifiasm. Using the default parameters (and -g 1.3g), the direct output quality reaches 1892 contings and the N50 reaches 3M. The evaluation of busco reached more than 95%. However, the genome size is still too small compared with the estimated size, and only 880m assembly is obtained. How can we adjust the parameters so that our results are close to the estimated genome size? thank you!
wtdbg2 tends to collapse similar regions. For your case, please try increase '-s 0.5' to '-s 0.8' or others.
thank you very much!I will try it.
Hi, I am getting 12% less bases to post consensus for my genome (complex and big, 100X coverage). I have checked is there missing contigs between the lay and cns.raw.fa file. I see no missing contigs. I just wonder what is driving this? or is it normal to loose that much bases in the consensus stage? I also wonder are there any parameters in wtpoa-cns I can tweak? Thank you.