Open Adelaam opened 5 years ago
please see the following,
anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ head -n 1 98ZLc_nanopore_assembly_racon_metka2.fasta
>PolyplaxNanoScarf_Scaffold100006:1.0-196.0:1.0-195.0
anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ head -n 1 98ZLc_nanopore_assembly_racon_metka2_racon2.fa
>PolyplaxNanoScarf_Scaffold100006:1.0-196.0:1.0-195.0 LN:i:194 RC:i:618 XC:f:1.000000
No idea what the problem is. And I am still not sure which names you want to change.
I wonder what is this addition in LN:i:194 RC:i:618 XC:f:1.000000, is it due to the renaming of the file. If so the when i do the following,
minimap2 -t 19 -ax map-ont 98ZLc_nanopore_assembly_racon_metka2_racon2.fasta 98ZLc_nanopore_raw.fastq | samtools view --threads 19 -Sb -F 0x104 - | samtools sort --threads 19 - > np_cov.bam samtools depth -aa np_cov.bam | awk -F "\t" '{a[$1] += $3; b[$1]++} END{OFS = ","; for (i in a) print i, a[i]/b[i]}' > np_cov.csv
the names of the input files will be different right?
LN:i:194 RC:i:618 XC:f:1.000000
are SAM tags produced by racon which denote the contig length, number of reads used for polishing the contig and the percentage of contig windows polished, respectively. You can try removing them manually for a couple of contigs and see if you get coverages.
ok, i will try.
Hi! I am trying to find the best assembler and I'm analyzing different methods using long sequences by Nanopore. While trying to get the new file containing the overlaps by:
bwa index r18_guppy_combine.fastq
bwa mem r18_guppy_combine.fastq r18_guppy_combine.fastq > R18_bwa_mem_racon1.sam
I get a sam file (R18_bwa_mem_racon1.sam) which is not empty:
ls -l
total 9251888
-rw-rw-r-- 1 edgar edgar 3296041316 jun 22 21:48 R18_bwa_mem_racon1.sam
-rw-rw-r-- 7 edgar edgar 3292112286 mei 6 17:39 r18_guppy_combine.fastq
-rw-rw-r-- 1 edgar edgar 20 jun 19 16:25 r18_guppy_combine.fastq.amb
-rw-rw-r-- 1 edgar edgar 25904106 jun 19 16:25 r18_guppy_combine.fastq.ann
-rw-rw-r-- 1 edgar edgar 1634194112 jun 19 16:25 r18_guppy_combine.fastq.bwt
-rw-rw-r-- 1 edgar edgar 408548509 jun 19 16:25 r18_guppy_combine.fastq.pac
-rw-rw-r-- 1 edgar edgar 817097064 jun 19 16:46 r18_guppy_combine.fastq.sa
-rw-rw-r-- 1 edgar edgar 0 jun 24 09:30 R18_racon1.fasta
And when I am doing racon says that there is an empty overlap:
racon r18_guppy_combine.fastq R18_bwa_mem_racon1.sam r18_guppy_combine.fastq > R18_racon1.fasta
[racon::Polisher::initialize] loaded target sequences 103.960 s
[racon::Polisher::initialize] loaded sequences 99.487 s
[racon::Polisher::initialize] error: empty overlap set!
The raw reads are from Guppy which I also selected them as the draft genome. THe procedure I want to do is to run 4 times this pipeline and finishing with Medaka.
Hope you could help me out. THanks in advance.
Hello! are you trying to correct errors in your reads or polish your assembly? Your commands look like the former (i.e. you are mapping your reads to each other) but I am not sure why all overlaps are filtered out. Can you check that the first alignment record in the SAM has the exact sequence header as the corresponding sequence in the FASTQ file? Also, racon filters out low quality alignments (0.3 error threshold), so you can check that too.
If you are actually trying to polish your assembly, you have to map your reads to the assembly and then run racon with
Racon -t <threads> reads mappings assembly > polished_assembly
Best regards, Robert
Hi! Sorry, I guess I was not clear. Indeed, I am trying to correct errors because I don't use any assembly. I already checked and both files (SAM and FASTQ) have the same sequence header. On the other hand, I don't know how to check the low quality alignments. Where can I see the threshold?
Thanks for your answer!
If SAM file is used, the error rate is calculated as 1 - min(q,t) / max(q,t)
, where q and t represent number of matches and mismatches (obtained from CIGAR) in query and target, respectively. I am not sure if that is the problem though. I would advise you to run the following commands to see if everyrhing works fine:
minimap2 -t 12 -ax ava-ont --dual=yes r18_guppy_combine.fastq r18_guppy_combine.fastq > overlaps.sam
racon -t 12 -f r18_guppy_combine.fastq overlaps.sam r18_guppy_combine.fastq > r18_polished.fasta
Notes:
-f
to racon so that it takes all found overlaps per read (main difference with respect to assembly polishing)--dual=yes
is passed to minimap2Thank you! So far everything is working.
Hi I am new to Recon and was trying to run the test files (racon/test/data) but it gives me error as shown below:
racon sample_reads.fastq.gz sample_overlaps.paf.gz sample_reference.fasta.gz >racon1.fa
[racon::Polisher::initialize] loaded target sequence 0.001033 s
[racon::Polisher::initialize] loaded sequences 0.109507 s
[racon::Polisher::initialize] error: empty overlap set!
Can you tell me what am I doing wrong?
Thanks!
Hello Jenny,
can you please paste the command you used to obtain the sample_overlaps.paf.gz
file?
Best regards, Robert
Hi, It is already present in /test/data folder in racon-1.3.2.
Thanks.
Ugh I am sorry, misread the inquiry. If you want to run the test data, you have to use the layout file:
racon sample_reads.fastq.gz sample_overlaps.paf.gz sample_layout.fasta.gz > racon1.fa
The reference file is used to assess the accuracy.
Best regards, Robert
Thank you so much for quick responses. It worked this time!!
Regards, Jenny
Hello,
I am trying Racon for first time. I am getting this error when I try to polish my data. I have tried with the files provided for the test and I get same error.
[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] error: empty overlap set!
Thank you very much.