isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
261 stars 48 forks source link

illumina correction #146

Closed aspitaleri closed 4 years ago

aspitaleri commented 4 years ago

Hi there, sorry for the naive question about racon. I have read threads and tutorial but still I get confused. Basically I have nanopore reads and illumina reads. I'd like to exploit illumina reads to correct nanopore assembly. So far I did:

read overlap

minimap2 -x ava-ont -t 8 nanopore.fastq.gz nanopore.fastq.gz | gzip -1 > reads.paf.gz

layout

miniasm -f nanopore.fastq.gz reads.paf.gz > reads.gfa

consensus GFA to fasta

awk '$1 ~/S/ {print ">"$2"\n"$3}' reads.gfa > reads.fasta

Correction 1

minimap2 -t 8 reads.fasta nanopore.fastq.gz > reads.gfa1.paf racon -t 8 racon nanopore.fastq.gz reads.gfa1.paf reads.fasta > reads.racon1.fasta

Now If I want correct reads.racon1.fasta with illumina reads, R1 and R2, what is the correct step? Thanks in advance Best

rvaser commented 4 years ago

Hello Andrea, you have to first join the Illumina files with this script, and then run the following commands:

minimap2 -t 8 -ax sr reads.racon1.fasta R12.fastq > reads.2.sam
racon -t 8 R12.fastq reads.2.sam reads.racon1.fasta > reads.racon2.fasta

Best regards, Robert

aspitaleri commented 4 years ago

Hi, thanks @rvaser for the quick reply! I did it but the python has some problem:

python3 shuffle_pairs_fastq.py KP45_1.fastq KP45_2.fastq > all.fasta

Traceback (most recent call last): File "shuffle_pairs_fastq.py", line 56, in parse_file(sys.argv[1], 1) File "shuffle_pairs_fastq.py", line 26, in parse_file print(name + read_number) TypeError: must be str, not int

rvaser commented 4 years ago

I copied a user modified script, the original which should work is here.

aspitaleri commented 4 years ago

That's fine, it works. Thanks!

aspitaleri commented 4 years ago

@rvaser if I have want to run i.e. 3 correction iteration using Illumina, I should do: 1.correction minimap2 -t 8 -ax sr reads.racon1.fasta R12.fastq > reads.2.sam racon -t 8 R12.fastq reads.2.sam reads.racon1.fasta > reads.racon2.fasta 2.correction minimap2 -t 8 -ax sr reads.racon2.fasta R12.fastq > reads.3.sam racon -t 8 R12.fastq reads.3.sam reads.racon2.fasta > reads.racon3.fasta

and so on.

rvaser commented 4 years ago

Hi Andrea, yes, the commands you proposed are the way to go.

Best regards, Robert

aspitaleri commented 4 years ago

thanks so much!

aspitaleri commented 4 years ago

This is a off topic. Once I have the final fasta and I'd like to compare with the starting fasta file (e.g. before the polishing) how I can determine the starting and final nanopore errors? Any links in helping this is welcome. Thanks again. Best

rvaser commented 4 years ago

Hi Andrea, you could use dnadiff from the mummer package to see differences between the Nanopore assembly and the Illumina polished assembly (if I understood you correctly).

Sorry for the late reply! Best regards, Robert

aspitaleri commented 4 years ago

yes I will give a try! Thanks again