Open mictadlo opened 6 years ago
Hi Michal,
as I can see you are using reads in FASTQ format? If you created them with artificial qualities, you can avoid this step and pass them in FASTA format without the option --bq -1
. The rest is fine, although you can use minimap2 with just parameter -x ava-pb
when overlapping and only -x map-pb
when mapping to reference.
Best regards, Robert
You can as well add an Illumina polishing step (if you have the data). Use minimap2 with -x sr
option and racon with default arguments.
Thank you. Do you think the new pipeline make sense:
minimap2 -t 20 -x ava-pb RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta | gzip -1 > plant-gt-10k/plant.paf.gz
miniasm -Rc2 -f RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.paf.gz \
> plant-gt-10k/plant.gfa
awk '/^S/{print ">"$2"\n"$3}' plant-gt-10k/plant.gfa | fold > plant-gt-10k/plant.gfa.fasta
# Correction 1
minimap2 -t 20 -ax map-pb plant-gt-10k/plant.gfa.fasta \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
> plant-gt-10k/plant.gfa1.sam
./bin/racon -t 20 \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.gfa1.sam \
plant-gt-10k/plant.gfa \
plant-gt-10k/plant.racon1.fasta
# Correction 2
minimap2 -t 20 -ax map-pb plant-gt-10k/plant.racon1.fasta \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta\
> plant-gt-10k/plant.gfa2.sam
./bin/racon -t 20 \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.gfa2.sam \
plant-gt-10k/plant.racon1.fasta \
plant-gt-10k/plant.racon2.fasta
When should I use racon_wrapper.py
?
Thank you in advance.
Michal
Use the wrapper when your data won't fit into memory. The interface is the same plus two new options: --split
and --subsample
(check the usage).
Edits to script: pass the reads to both arguments for all-vs-all overlaps in minimap2; pass the backbone in FASTA/FASTQ format to racon; as the fourth argument was dropped from racon, the consensus is written to stdout. The pipeline now looks like this:
minimap2 -t 20 -x ava-pb \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta | gzip -1 > plant-gt-10k/plant.paf.gz
miniasm -Rc2 -f \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.paf.gz > plant-gt-10k/plant.gfa
awk '/^S/{print ">"$2"\n"$3}' plant-gt-10k/plant.gfa | fold > plant-gt-10k/plant.gfa.fasta
# Correction 1
minimap2 -t 20 -ax map-pb \
plant-gt-10k/plant.gfa.fasta \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta > plant-gt-10k/plant.gfa1.sam
./bin/racon -t 20 \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.gfa1.sam \
plant-gt-10k/plant.gfa.fasta > plant-gt-10k/plant.racon1.fasta
# Correction 2
minimap2 -t 20 -ax map-pb \
plant-gt-10k/plant.racon1.fasta \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta > plant-gt-10k/plant.gfa2.sam
./bin/racon -t 20 \
RSnSeQ_110plant-rm-clean-rmsmrtbell-gt-10k-unique.fasta \
plant-gt-10k/plant.gfa2.sam \
plant-gt-10k/plant.racon1.fasta > plant-gt-10k/plant.racon2.fasta
Hi, last year I ran the below pipeline. In the meantime, minimap2 came out, racon supports FASTA files now and racon_wrapper.py has been developed.
For my new dataset, I have only FASTA files and just wondering whether there is a better way to run Racon pipeline?
Thank you in advance,
Michal