Polish noisy long reads with hifi long reads

isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:

https://github.com/lbcb-sci/racon

MIT License

269 stars 49 forks source link

Polish noisy long reads with hifi long reads #176

Open TesiNicco opened 3 years ago

TesiNicco commented 3 years ago

Hi,

I am relatively new to racon. I am trying to polish noisy long reads (pacbio) using hi-fidelity long reads (pacbio).

I am using pbmm2 (SMRT C++ wrapper for minimap2's C API) for alignment: pbmm2 align hifi_reads.fa reads_to_correct.fa reads_to_correct_aligned.bam The resulting sam file contains alignments.

Then I would use racon to polish the reads: racon hifi_reads.fa reads_to_correct_aligned.sam reads_to_correct.fa

The error I got is: [racon::Polisher::initialize] error: empty overlap set!

From what I understood, the way the files are specified in the racon command should be correct. Any idea what can be the cause of this?

Additional details: reads for correction.fa = 2 high quality reads (quality ~99%) reads to be corrected.fa = 6 noisy reads (quality ~80%) All the reads align to the same portion of the reference genome.

Thanks in advance

rvaser commented 3 years ago

Hi Niccolo, you have to reorder HiFi and CLR reads in your pbmm2 command (first CLR reads which are the "reference" on which you map the HiFi reads). As you have 2 HiFi vs 6 CLR reads, use parameter -f when running Racon, so that it uses all found alignments. Otherwise, it will use only the longest alignment per HiFi read and the majority of the CLR reads will not be polished.

Best regards, Robert

lychen83 commented 3 years ago

Hi Robert,

I have a question about polishing my data. I have sequenced 150× data for my plant using Illumina hiseq (150bp × 2). I am also sequencing my plant with the Pacbio Hifi (ccs) method. According to this paper 'Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome', the accuracy of the HIFI is about 99.8 percent. I am wondering if it still needs to use the Hiseq data to polish the assembly (correct the base errors) from the HIFI reads.

I appreciate your help.

Best,

Lingyun

rvaser commented 3 years ago

Hi Lingyun, I have not evaluated HiFi vs Illumina polishing so far, so I cannot advise you. You could first polish with HiFi, and afterwards with Illumina, and evaluate the accuracy of both steps.

Best regards, Robert

socialhang commented 1 year ago

HI @rvaser,

I try to use HIFI reads polish ONT reads by racon, but I failed.

[racon::Polisher::initialize] error: empty overlap set!

It is the command.

racon hifi.small.fq small.paf UL.small.fq

And it is the file.

UL.small.fq.txt hifi.small.fq.txt small.paf.txt

Can you give me some advise?

Best regards, Hang.