isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
268 stars 48 forks source link

Racon for low heterozygote diploid genome #139

Open ghost opened 5 years ago

ghost commented 5 years ago

Hello, is Racon expected to work for such genome? I ran, using PE illumina mapped with bwa mem2

racon -t 48 illumina.fq mapped.sam raw.fasta > racon.polished.fasta

the resulting fasta has 3 less contigs. However, the number of Het sites is increased: Before racon: 1150321 After racon: 1208714

Any idea of why this might be so? My genome is expected to have a heterozygosity of 2~3 %. I realise Racon might not be suitable for such case. Is Racon able to handle diploid sites?

Thank you. EDIT: I realise it might be a duplicate of issue #135

rvaser commented 5 years ago

Hello, are the 3 missing contigs small? Sometimes due to lack of coverage some smaller contigs drop out after polishing, but you can retain them with option -u (--include-unpolished). Which assembler did you use to get the raw assembly? How are you getting the information about Het sites? Can you explain how this issue is a duplicate of #135?

Best regards, Robert

ghost commented 5 years ago

Oh I am really sorry it was issue #134 yes it seems they are small ones. Here are more information about the assembly:

The assembly was obtained using Flye with ONT (60 x coverage) and PacBio (140 x coverage) reads, then scaffolded with a 3C scaffolder (instagraal). For the Het sites, I mapped reads with bwa mem2 and called variants with deepvariants. I think the sequencing quality of the illumina reads I used is all right (see the plot) Rplot_illuminaBQ_ARCancestor

rvaser commented 5 years ago

Can you please copy the commands you used in running Racon with Illumina data (both mapping and Racon step)?

ghost commented 5 years ago
./bwa-mem2 mem -t 48 -p raw.fasta illumina.fq > mapped.sam
racon -t 48 illumina.fq mapped.sam raw.fasta > racon.polished.fasta
rvaser commented 5 years ago

Looks alright. Not sure what to tell you here :/ You can try with minimap2 -ax sr and check whether it is better or not.

ghost commented 5 years ago

okay I will do that. I will report to you (I am in a bottleneck situation on the cluster, so it might take a few days).

ghost commented 5 years ago

Hello, so I did the polishing with minimap2 and the same is happening before: 1435144 after: 1847183

rvaser commented 5 years ago

Not sure what to tell you here.

ghost commented 5 years ago

Well, it's a notoriously hard-to-assemble genome I am dealing with. So I am not so surprised that "regular" tools give unexpected results. If I find the cause I will let you know.

rvaser commented 5 years ago

Did you by any chance try Pilon in the same setting?

ghost commented 5 years ago

That's the next step