jts / nanopolish

Signal-level algorithms for MinION data
MIT License
569 stars 159 forks source link

nanopolish variants: too many arguments #870

Closed 123chenshixin closed 3 years ago

123chenshixin commented 3 years ago

Hi, I'm using nanopolish variants to polish the draft genome which is the assbemly result by using canu.I use the Illumina sequencing data to polish my draft genome which is named JYF80.contigs.fasta.Since I am the first time using nanopolish, I use the command you have given on the Internet.The command are as follows:

!/bin/bash

for a in JYF80 do draf_fa=/home/cxs3_z4/csx/20201209/canu/${a}/${a}.contigs.fasta read_1_fa=/home/cxs3_z4/cff/Illumina/QC/${a}_R1.fq.gz read_2_fa=/home/cxs3_z4/cff/Illumina/QC/${a}_R2.fq.gz genome=./${a}.contigs.fasta

ln -s ${draf_fa} ./

Index the draft genome

bwa index ${genome}

Align the basecalled reads to the draft sequence

bwa mem -x ont2d -t 8 ${genome} ${read_1_fa} ${read_2_fa} | samtools sort -o reads.sorted.bam -T reads.tmp - samtools index reads.sorted.bam

python3 /home/cxs3_z4/software/nanopolish/scripts/nanopolish_makerange.py ${genome} | parallel --results nanopolish.results -P 8 nanopolish variants --consensus polished.{1}.fa -w {1} -r ${read_1_fa} ${read_2_fa} -b reads.sorted.bam -g ${genome} -t 4 --min-candidate-frequency 0.1 done

Then,the reads.sorted.bam and reads.sorted.bam.bai files are generated.But it tells me that "variants: too many arguments" and print many times of the informations which are the same results by using the command "nanopolish variants -h". It is obvious that the nanopolish variants has problems but I try many methods and they didn't work.My reference genome is 12M which is a kind of yeast.I don't know wheather it is too small for "-P 8".

jts commented 3 years ago

Hi,

The commands you are using are for an old version of nanopolish. Please see the instructions for the latest versions here:

https://nanopolish.readthedocs.io/en/latest/quickstart_consensus.html

Jared

On Dec 26, 2020, at 10:35 PM, 123chenshixin notifications@github.com wrote:

 Hi, I'm using nanopolish variants to polish the draft genome which is the assbemly result by using canu.I use the Illumina sequencing data to polish my draft genome which is named JYF80.contigs.fasta.Since I am the first time using nanopolish, I use the command you have given on the Internet.The command are as follows:

!/bin/bash

for a in JYF80 do draf_fa=/home/cxs3_z4/csx/20201209/canu/${a}/${a}.contigs.fasta read_1_fa=/home/cxs3_z4/cff/Illumina/QC/${a}_R1.fq.gz read_2_fa=/home/cxs3_z4/cff/Illumina/QC/${a}_R2.fq.gz genome=./${a}.contigs.fasta

ln -s ${draf_fa} ./

Index the draft genome

bwa index ${genome}

Align the basecalled reads to the draft sequence

bwa mem -x ont2d -t 8 ${genome} ${read_1_fa} ${read_2_fa} | samtools sort -o reads.sorted.bam -T reads.tmp - samtools index reads.sorted.bam

python3 /home/cxs3_z4/software/nanopolish/scripts/nanopolish_makerange.py ${genome} | parallel --results nanopolish.results -P 8 nanopolish variants --consensus polished.{1}.fa -w {1} -r ${read_1_fa} ${read_2_fa} -b reads.sorted.bam -g ${genome} -t 4 --min-candidate-frequency 0.1 done

Then,the reads.sorted.bam and reads.sorted.bam.bai files are generated.But it tells me that "variants: too many arguments" and print many times of the informations which are the same results by using the command "nanopolish variants -h". It is obvious that the nanopolish variants has problems but I try many methods and they didn't work.My reference genome is 12M which is a kind of yeast.I don't know wheather it is too small for "-P 8".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

123chenshixin commented 3 years ago

I'm soryy that it doesn't work.The problem is the same.I even try to manual input the command. minimap2 -ax map-ont -t 8 ${genome} ${read_1_fa} ${read_2_fa} | samtools sort -o reads.sorted.bam -T reads.tmp samtools index reads.sorted.bam

nanopolish variants --consensus -o polished.vcf \ -w "tig00000001:200000-202000" \ -r ${read_1_fa} ${read_2_fa} \ -b reads.sorted.bam \ -g ${genome}

jts commented 3 years ago

Can you paste the full error message you received?

On Sat, Dec 26, 2020 at 11:42 PM 123chenshixin notifications@github.com wrote:

I'm soryy that it doesn't work.The problem is the same.I even try to manual input the command. minimap2 -ax map-ont -t 8 ${genome} ${read_1_fa} ${read_2_fa} | samtools sort -o reads.sorted.bam -T reads.tmp samtools index reads.sorted.bam

nanopolish variants --consensus -o polished.vcf -w "tig00000001:200000-202000" -r ${read_1_fa} ${read_2_fa} -b reads.sorted.bam -g ${genome}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jts/nanopolish/issues/870#issuecomment-751425693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC7DHZULQONUCS23VHSRFLSW23MHANCNFSM4VKOOEOA .

123chenshixin commented 3 years ago

The entire output is follows:

[M::mm_idx_gen::0.4881.01] collected minimizers [M::mm_idx_gen::0.6302.01] sorted minimizers [M::main::0.6302.01] loaded/built the index for 20 target sequence(s) [M::mm_mapopt_update::0.7181.88] mid_occ = 31 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 20 [M::mm_idx_stat::0.7741.82] distinct minimizers: 2077969 (95.40% are singletons); average occurrences: 1.116; average spacing: 5.337 [M::worker_pipeline::74.2446.41] mapped 3370249 sequences [M::worker_pipeline::91.8215.26] mapped 2497543 sequences [M::worker_pipeline::163.9785.81] mapped 3370249 sequences [M::worker_pipeline::181.520*5.29] mapped 2497543 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -t 8 ./JYF80.contigs.fasta /home/cxs3_z4/cff/Illumina/QC/JYF80_R1.fq.gz /home/cxs3_z4/cff/Illumina/QC/JYF80_R2.fq.gz [M::main] Real time: 182.086 sec; CPU: 961.384 sec; Peak RSS: 3.813 GB [bam_sort_core] merging from 6 files and 1 in-memory blocks... variants: too many arguments

Usage: nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa Find SNPs using a signal-level HMM

-v, --verbose display verbose output --version display version --help display this help and exit --snps only call SNPs --consensus run in consensus calling mode --fix-homopolymers run the experimental homopolymer caller --faster minimize compute time while slightly reducing consensus accuracy -w, --window=STR find variants in window STR (format: :-) -r, --reads=FILE the ONT reads are in fasta FILE -b, --bam=FILE the reads aligned to the reference genome are in bam FILE -e, --event-bam=FILE the events aligned to the reference genome are in bam FILE -g, --genome=FILE the reference genome is in FILE -p, --ploidy=NUM the ploidy level of the sequenced genome -q --methylation-aware=STR turn on methylation aware polishing and test motifs given in STR (example: -q dcm,dam) --genotype=FILE call genotypes for the variants in the vcf FILE -o, --outfile=FILE write result to FILE [default: stdout] -t, --threads=NUM use NUM threads (default: 1) -m, --min-candidate-frequency=F extract candidate variants from the aligned reads when the variant frequency is at least F (default 0.2) -i, --indel-bias=F bias HMM transition parameters to favor insertions (F<1) or deletions (F>1). this value is automatically set depending on --consensus, but can be manually set if spurious indels are called -d, --min-candidate-depth=D extract candidate variants from the aligned reads when the depth is at least D (default: 20) -x, --max-haplotypes=N consider at most N haplotype combinations (default: 1000) --min-flanking-sequence=N distance from alignment end to calculate variants (default: 30) --max-rounds=N perform N rounds of consensus sequence improvement (default: 50) -c, --candidates=VCF read variant candidates from VCF, rather than discovering them from aligned reads --read-group=RG only use alignments with read group tag RG -a, --alternative-basecalls-bam=FILE if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam. --calculate-all-support when making a call, also calculate the support of the 3 other possible bases --models-fofn=FILE read alternative k-mer models from FILE

Report bugs to https://github.com/jts/nanopolish/issues

jts commented 3 years ago

Oh, I see the problem now. You are trying to use nanopolish with paired end illumina data. Nanopolish only supports nanopore data.

Jared

On Dec 27, 2020, at 12:15 AM, 123chenshixin notifications@github.com wrote:

 The entire output is follows:

[M::mm_idx_gen::0.4881.01] collected minimizers [M::mm_idx_gen::0.6302.01] sorted minimizers [M::main::0.6302.01] loaded/built the index for 20 target sequence(s) [M::mm_mapopt_update::0.7181.88] mid_occ = 31 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 20 [M::mm_idx_stat::0.7741.82] distinct minimizers: 2077969 (95.40% are singletons); average occurrences: 1.116; average spacing: 5.337 [M::worker_pipeline::74.2446.41] mapped 3370249 sequences [M::worker_pipeline::91.8215.26] mapped 2497543 sequences [M::worker_pipeline::163.9785.81] mapped 3370249 sequences [M::worker_pipeline::181.520*5.29] mapped 2497543 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -t 8 ./JYF80.contigs.fasta /home/cxs3_z4/cff/Illumina/QC/JYF80_R1.fq.gz /home/cxs3_z4/cff/Illumina/QC/JYF80_R2.fq.gz [M::main] Real time: 182.086 sec; CPU: 961.384 sec; Peak RSS: 3.813 GB [bam_sort_core] merging from 6 files and 1 in-memory blocks... variants: too many arguments

Usage: nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa Find SNPs using a signal-level HMM

-v, --verbose display verbose output --version display version --help display this help and exit --snps only call SNPs --consensus run in consensus calling mode --fix-homopolymers run the experimental homopolymer caller --faster minimize compute time while slightly reducing consensus accuracy -w, --window=STR find variants in window STR (format: :-) -r, --reads=FILE the ONT reads are in fasta FILE -b, --bam=FILE the reads aligned to the reference genome are in bam FILE -e, --event-bam=FILE the events aligned to the reference genome are in bam FILE -g, --genome=FILE the reference genome is in FILE -p, --ploidy=NUM the ploidy level of the sequenced genome -q --methylation-aware=STR turn on methylation aware polishing and test motifs given in STR (example: -q dcm,dam) --genotype=FILE call genotypes for the variants in the vcf FILE -o, --outfile=FILE write result to FILE [default: stdout] -t, --threads=NUM use NUM threads (default: 1) -m, --min-candidate-frequency=F extract candidate variants from the aligned reads when the variant frequency is at least F (default 0.2) -i, --indel-bias=F bias HMM transition parameters to favor insertions (F<1) or deletions (F>1). this value is automatically set depending on --consensus, but can be manually set if spurious indels are called -d, --min-candidate-depth=D extract candidate variants from the aligned reads when the depth is at least D (default: 20) -x, --max-haplotypes=N consider at most N haplotype combinations (default: 1000) --min-flanking-sequence=N distance from alignment end to calculate variants (default: 30) --max-rounds=N perform N rounds of consensus sequence improvement (default: 50) -c, --candidates=VCF read variant candidates from VCF, rather than discovering them from aligned reads --read-group=RG only use alignments with read group tag RG -a, --alternative-basecalls-bam=FILE if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam. --calculate-all-support when making a call, also calculate the support of the 3 other possible bases --models-fofn=FILE read alternative k-mer models from FILE

Report bugs to https://github.com/jts/nanopolish/issues

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

123chenshixin commented 3 years ago

But when I use nanopore sequences to polish my draft genome, it semms that it must require the fast5 file which is used to build the index files for nanopore sequences. However, I don't have the fast5 file for some reasons.Wheather I can run nanopolish without it? To confirm the error,the results of running the command from the website you have given are as follows:

Error: no fast5 files found [M::mm_idx_gen::0.5780.99] collected minimizers [M::mm_idx_gen::0.7831.71] sorted minimizers [M::main::0.7831.71] loaded/built the index for 20 target sequence(s) [M::mm_mapopt_update::0.8681.64] mid_occ = 31 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 20 [M::mm_idx_stat::0.9201.60] distinct minimizers: 2079913 (95.37% are singletons); average occurrences: 1.117; average spacing: 5.331 [M::worker_pipeline::81.4797.71] mapped 61731 sequences [M::worker_pipeline::126.2886.75] mapped 56266 sequences [M::worker_pipeline::128.1126.66] mapped 21087 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -t 8 ./JYF80.racon3.fasta /home/cxs3_z4/yeast/nanopore/ONT_JYF80.fastq [M::main] Real time: 128.209 sec; CPU: 853.525 sec; Peak RSS: 5.408 GB [bam_sort_core] merging from 2 files and 1 in-memory blocks... [fai_load] build FASTA index. error: could not load the index files for input file /home/cxs3_z4/yeast/nanopore/ONT_JYF80.fastq Please run nanopolish index on your reads (see documentation) [vcf2fasta] rewrote contig tig00000001 with 0 subs, 0 ins, 0 dels (0 skipped) [vcf2fasta] rewrote contig tig00000002 with 0 subs, 0 ins, 0 dels (0 skipped) .........

jts commented 3 years ago

Sorry but you need the fast5s for nanopolish.

Jared

On Sun, Dec 27, 2020 at 9:40 PM 123chenshixin notifications@github.com wrote:

But when I use nanopore sequences to polish my draft genome, it semms that it must require the fast5 file which is used to build the index files for nanopore sequences. However, I don't have the fast5 file for some reasons.Wheather I can run nanopolish without it? To confirm the error,the results of running the command from the website you have given are as follows:

Error: no fast5 files found [M::mm_idx_gen::0.578 0.99] collected minimizers [M::mm_idx_gen::0.7831.71] sorted minimizers [M::main::0.783 1.71] loaded/built the index for 20 target sequence(s) [M::mm_mapopt_update::0.8681.64] mid_occ = 31 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 20 [M::mm_idx_stat::0.920 1.60] distinct minimizers: 2079913 (95.37% are singletons); average occurrences: 1.117; average spacing: 5.331 [M::worker_pipeline::81.4797.71] mapped 61731 sequences [M::worker_pipeline::126.288 6.75] mapped 56266 sequences [M::worker_pipeline::128.1126.66] mapped 21087 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -t 8 ./JYF80.racon3.fasta /home/cxs3_z4/yeast/nanopore/ONT_JYF80.fastq [M::main] Real time: 128.209 sec; CPU: 853.525 sec; Peak RSS: 5.408 GB [bam_sort_core] merging from 2 files and 1 in-memory blocks... [fai_load] build FASTA index. error: could not load the index files for input file /home/cxs3_z4/yeast/nanopore/ONT_JYF80.fastq Please run nanopolish index on your reads (see documentation) [vcf2fasta] rewrote contig tig00000001 with 0 subs, 0 ins, 0 dels (0 skipped) [vcf2fasta] rewrote contig tig00000002 with 0 subs, 0 ins, 0 dels (0 skipped) .........

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jts/nanopolish/issues/870#issuecomment-751554078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC7DH6NVEP44ZOAUKN5TP3SW7VYXANCNFSM4VKOOEOA .

123chenshixin commented 3 years ago

OK. I will try in another way. Thanks for repplying my questions!