isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
271 stars 49 forks source link

empty overlap set! #103

Open Adelaam opened 5 years ago

Adelaam commented 5 years ago

Hello,

I am trying Racon for first time. I am getting this error when I try to polish my data. I have tried with the files provided for the test and I get same error.

[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] error: empty overlap set!

Thank you very much.

apoosakkannu commented 5 years ago

please see the following,

anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ head -n 1 98ZLc_nanopore_assembly_racon_metka2.fasta 
>PolyplaxNanoScarf_Scaffold100006:1.0-196.0:1.0-195.0
anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ head -n 1 98ZLc_nanopore_assembly_racon_metka2_racon2.fa 
>PolyplaxNanoScarf_Scaffold100006:1.0-196.0:1.0-195.0 LN:i:194 RC:i:618 XC:f:1.000000
rvaser commented 5 years ago

No idea what the problem is. And I am still not sure which names you want to change.

apoosakkannu commented 5 years ago

I wonder what is this addition in LN:i:194 RC:i:618 XC:f:1.000000, is it due to the renaming of the file. If so the when i do the following,

minimap2 -t 19 -ax map-ont 98ZLc_nanopore_assembly_racon_metka2_racon2.fasta 98ZLc_nanopore_raw.fastq | samtools view --threads 19 -Sb -F 0x104 - | samtools sort --threads 19 - > np_cov.bam samtools depth -aa np_cov.bam | awk -F "\t" '{a[$1] += $3; b[$1]++} END{OFS = ","; for (i in a) print i, a[i]/b[i]}' > np_cov.csv

the names of the input files will be different right?

rvaser commented 5 years ago

LN:i:194 RC:i:618 XC:f:1.000000 are SAM tags produced by racon which denote the contig length, number of reads used for polishing the contig and the percentage of contig windows polished, respectively. You can try removing them manually for a couple of contigs and see if you get coverages.

apoosakkannu commented 5 years ago

ok, i will try.

Caro-Ca commented 5 years ago

Hi! I am trying to find the best assembler and I'm analyzing different methods using long sequences by Nanopore. While trying to get the new file containing the overlaps by:

bwa index r18_guppy_combine.fastq 

bwa mem r18_guppy_combine.fastq r18_guppy_combine.fastq > R18_bwa_mem_racon1.sam 

I get a sam file (R18_bwa_mem_racon1.sam) which is not empty:

ls -l
total 9251888
-rw-rw-r-- 1 edgar edgar 3296041316 jun 22 21:48 R18_bwa_mem_racon1.sam
-rw-rw-r-- 7 edgar edgar 3292112286 mei  6 17:39 r18_guppy_combine.fastq
-rw-rw-r-- 1 edgar edgar         20 jun 19 16:25 r18_guppy_combine.fastq.amb
-rw-rw-r-- 1 edgar edgar   25904106 jun 19 16:25 r18_guppy_combine.fastq.ann
-rw-rw-r-- 1 edgar edgar 1634194112 jun 19 16:25 r18_guppy_combine.fastq.bwt
-rw-rw-r-- 1 edgar edgar  408548509 jun 19 16:25 r18_guppy_combine.fastq.pac
-rw-rw-r-- 1 edgar edgar  817097064 jun 19 16:46 r18_guppy_combine.fastq.sa
-rw-rw-r-- 1 edgar edgar          0 jun 24 09:30 R18_racon1.fasta

And when I am doing racon says that there is an empty overlap:

racon r18_guppy_combine.fastq R18_bwa_mem_racon1.sam r18_guppy_combine.fastq > R18_racon1.fasta 
[racon::Polisher::initialize] loaded target sequences 103.960 s
[racon::Polisher::initialize] loaded sequences 99.487 s
[racon::Polisher::initialize] error: empty overlap set!    

The raw reads are from Guppy which I also selected them as the draft genome. THe procedure I want to do is to run 4 times this pipeline and finishing with Medaka.

Hope you could help me out. THanks in advance.

rvaser commented 5 years ago

Hello! are you trying to correct errors in your reads or polish your assembly? Your commands look like the former (i.e. you are mapping your reads to each other) but I am not sure why all overlaps are filtered out. Can you check that the first alignment record in the SAM has the exact sequence header as the corresponding sequence in the FASTQ file? Also, racon filters out low quality alignments (0.3 error threshold), so you can check that too.

If you are actually trying to polish your assembly, you have to map your reads to the assembly and then run racon with

Racon -t <threads> reads mappings assembly > polished_assembly

Best regards, Robert

Caro-Ca commented 5 years ago

Hi! Sorry, I guess I was not clear. Indeed, I am trying to correct errors because I don't use any assembly. I already checked and both files (SAM and FASTQ) have the same sequence header. On the other hand, I don't know how to check the low quality alignments. Where can I see the threshold?

Thanks for your answer!

rvaser commented 5 years ago

If SAM file is used, the error rate is calculated as 1 - min(q,t) / max(q,t), where q and t represent number of matches and mismatches (obtained from CIGAR) in query and target, respectively. I am not sure if that is the problem though. I would advise you to run the following commands to see if everyrhing works fine:

minimap2 -t 12 -ax ava-ont --dual=yes r18_guppy_combine.fastq r18_guppy_combine.fastq > overlaps.sam
racon -t 12 -f r18_guppy_combine.fastq overlaps.sam r18_guppy_combine.fastq > r18_polished.fasta

Notes:

Caro-Ca commented 5 years ago

Thank you! So far everything is working.

Jenny-Tamboli commented 4 years ago

Hi I am new to Recon and was trying to run the test files (racon/test/data) but it gives me error as shown below:

racon sample_reads.fastq.gz sample_overlaps.paf.gz sample_reference.fasta.gz >racon1.fa
[racon::Polisher::initialize] loaded target sequence 0.001033 s
[racon::Polisher::initialize] loaded sequences 0.109507 s
[racon::Polisher::initialize] error: empty overlap set!

Can you tell me what am I doing wrong?

Thanks!

rvaser commented 4 years ago

Hello Jenny, can you please paste the command you used to obtain the sample_overlaps.paf.gz file?

Best regards, Robert

Jenny-Tamboli commented 4 years ago

Hi, It is already present in /test/data folder in racon-1.3.2.

Thanks.

rvaser commented 4 years ago

Ugh I am sorry, misread the inquiry. If you want to run the test data, you have to use the layout file:

racon sample_reads.fastq.gz sample_overlaps.paf.gz sample_layout.fasta.gz > racon1.fa

The reference file is used to assess the accuracy.

Best regards, Robert

Jenny-Tamboli commented 4 years ago

Thank you so much for quick responses. It worked this time!!

Regards, Jenny