Open ghost opened 5 years ago
Hello Robert.
Below is an example of two header lines in my fasta file that I got as output after correcting with Canu:
1b05a25b-07ba-4a45-ac0f-4ae3ead718d9 runid=90e28bee03a6438bda9d3cb74d9e3105aa2ed89a sampleid=Barcodes read=331 ch=222 start_time=2018-10-29T13:51:43Z id=11 clr=0,2148
c50437e9-ec14-46d5-af5c-57981519079d runid=90e28bee03a6438bda9d3cb74d9e3105aa2ed89a sampleid=Barcodes read=1832 ch=390 start_time=2018-10-29T14:11:48Z id=19 clr=0,1058
In the above is the runid causing the error?
Everything after the first space is not used (i.e. only the string before runid is stored). Make sure that all sequences in both contig and read file have unique names up to the first space.
Hi,
The sequences in both the contig and read file have unique names but I suspect may have unequal read lengths as when I used Canu it would have trimmed the sequences to output the corrected reads. Is there anyway I can change the parameters of the read lengths to be ignored by Racon and it goes ahead with the polishing?
Are you trying to polish contigs or correct reads with each other? Please provide me with your command and descriptions of input parameters.
Hi I am try to polish the sequence reads obtained from running the ONT Minion.I only have the original fastq files to work with and dont have the fast5 files and so I cannot use Nanopolish. Hence I am trying to use Racon for polishing and getting a final polished consesus I can use for comparing. Please see below my command and input parameters for Racon:
:~$ racon -t 2 <path/to/fastq_files> <path/to/sam_file> <path/to/correctedReads.fasta.gz > correctedReads_racon.fasta
Notes: -The fastq files are the original fastq files from the Minion experiment
CorrectedReads were obtained from Canu using the following command: ./canu -p Read_prefix -d <path /to/output_for_CorrectedReads> genomeSize=2.1k -nanopore-raw /path/to/Reads.fastq
sam file was generated using minimap2 using the following command: ./minimap2 -ax map- ont <path/to/ref.fasta> <path/to/correctedReads.fasta.gz> > <path/to /correctedReads_aln.sam
After running the command for Racon I get the following: [racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] error: duplicate sequence 1b05a25b-07ba-4a45-ac0f-4ae3ead718d9 with unequal data
I am stuck and dont know hat to do next and the only other alternative I have is to get the fast5 files and try polishing with Nanopolish instead of racon. BUt if you can assist me with resolving this error, i will greatly appreciate
I have trouble understanding your description. What is in the <path/to/ref.fasta> file?
Hi,
I meant the directory path to access the reference genome fasta file
What do you need the reference genome for?
It is required as input when generating an alignment in minimap2 mapping Oxford nanopore reads
Lets start from the beginning. You have some ONT data, is it DNA or RNA? Are you trying to assemble the sequenced genome or just increase the read accuracy?
I hope you can help and perhaps a solution to debugging the error that racon gives
If you want to polish your reads with racon you should run the following:
minimap2 -ax ava-ont --dual=yes <reads> <reads> > alignments.sam
racon -f <reads> alignments.sam <reads> > polished_reads.fasta
If you want to assemble your genome with canu and afterwards polish it again with racon, run the following:
canu -p <prefix> -d <directory> genomeSize=<size of the sequenced genome> -nanopore-raw <reads>
minimap2 -ax map-ont <canu contigs> <reads> > alignments.sam
racon <reads> alignments.sam <canu contigs> > polished_contigs.fasta
thank you for the help. I will run the commands and will let you know if all works out
Thank you for your help. i manage to get polished contigs with your suggested approach.
I now want to compare the polished contigs and the unpolished contigs using Mummer to see how efficient the polishing with racon was. If you have any suggestions of other scripts that can check the efficiency of polishing with racon, I will would very much appreciate if you can share them
I have been only using dnadiff from the Mummer package.
may you please send me the command you use with Mummer for dnadiff?. I have version 4 beta and it doesnt seem to be giving me the output i expect
I am running dnadiff <reference file> <assembly file>
which creates several files. In *.report file is the summary of the comparison.
I assume the reference file and assembly file is in sam format? But Thanks will try it out. thank you for the wonderful program you developed. It is very valuable in the event that one does not have immediate access to the large fast5 files generated by minion but still has a convenient way of polishing the fastq sequence files
Both files need to be in fasta format. Thank you for your kind words :)
I am running Mummer v4 beta and I keep getting 1 of two errors. This happens when I run dnadiff program. It gives the following:
Error :- multiple query file input required in SAM output format
or if I first run the nucmer program to generate the delta file and then use it as input in the dnadiff program I get the following:
Error:- could not parse delta file error- no 400
I know you are not maintaining the mummer program but I would appreciate any advice from your experience of running the program and are you using version 4 or another version?
I have been using this one: https://github.com/marbl/MUMmer3. Did not yet try v4. You can download it with git clone https://github.com/marbl/MUMmer3
and run make
in the created directory.
Hi Robert,
I'm digging up a bit of an older thread here. I'm having a very similar problem to Duncan while trying to run a pipeline similar to that described here https://www.biorxiv.org/content/10.1101/645903v3.full.pdf (though I only have amplicon reads, ~4000bp, no UMIs). I've checked through my fastq read identifiers, and as far as I can tell they're all unique. The racon manual says to input commands in this order:
racon [options ...] <sequences> <overlaps> <target sequences>
but as per the discussion above, I also attempted:
racon [options ...] <target sequences> <overlaps> <sequences>
and got the same [racon::Polisher::initialize] error: duplicate sequence <read identifier> with unequal data
error. So I'm a bit stumped.
Am I missing something?
Kind regards, Robert H
Here's my pipline in short, starting from my base-called reads in fastq format:
I first quality and length trimmed my reads with NanoFilt (>3000bp, >qual 13):
NanoFilt -l 3000 -q 13 raw.fastq > trimmed.fastq
I next generated reference consensus reads with usearch (0.75 id, double stranded):
usearch -cluster_fast trimmed.fastq -id 0.75 -strand both -centroids reference.fa
Then I mapped the trimmed reads to the reference sequences with minimap2:
minimap2 -ax map-ont -t 5 reference.fa trimmed.fastq > mapped.sam
And finally attemped to polish the trimmed reads in racon (tried both configurations, got the same error):
racon -m 8 -x -6 -g -8 -w 500 -t 5 reference.fa mapped.sam trimmed.fastq > polished.fa
racon -m 8 -x -6 -g -8 -w 500 -t 5 trimmed.fastq mapped.sam reference.fa > polished.fa
I checked for repeat names with the following, but all names were unique:
grep '^@[a-z|0-9]*-' raw.fastq | sort | uniq -c
grep '^@[a-z|0-9]*-' trimmed.fastq | sort | uniq -c
Hi Robert,
the problem is that you have a sequence in trimmed.fastq
and reference.fa
that share a name. Try renaming your reference reads, rerun the minimap2 command and run racon as racon trimmed.fastq mapped.sam reference.fa
.
Best regards, Robert V
Thanks Robert, this solved it
Hi Duncan, the error means that there are two sequences with the same identifier but with different lengths (i.e. they are not equal). Are you by any chance using paired end sequences where reads in a pair have the same name up to the first white space?
Best regards, Robert