Closed katievigil closed 10 months ago
This is almost always because the file doesn't exist or isn't readable. What does ls -la
return if run before the Canu command in the script?
-rw-r--r-- 1 kvigil taw 2076260858 Jan 18 14:52 barcode06.fastq.gz
(/lustre/project/taw/share/conda-envs/ONRviral) [kvigil@cypress01-123 concatenate]$ canu version
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
usage: canu [-version] [-citation] \
[-haplotype | -correct | -trim | -assemble | -trim-assemble] \
[-s <assembly-specifications-file>] \
-p <assembly-prefix> \
-d <assembly-directory> \
genomeSize=<number>[g|m|k] \
[other-options] \
[-haplotype{NAME} illumina.fastq.gz] \
[-corrected] \
[-trimmed] \
[-pacbio |
-nanopore |
-pacbio-hifi] file1 file2 ...
example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz
To restrict canu to only a specific stage, use:
-haplotype - generate haplotype-specific reads
-correct - generate corrected reads
-trim - generate trimmed reads
-assemble - generate an assembly
-trim-assemble - generate trimmed reads and then assemble them
The assembly is computed in the -d <assembly-directory>, with output files named
using the -p <assembly-prefix>. This directory is created if needed. It is not
possible to run multiple assemblies in the same directory.
The genome size should be your best guess of the haploid genome size of what is being
assembled. It is used primarily to estimate coverage in reads, NOT as the desired
assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'
Some common options:
useGrid=string
- Run under grid control (true), locally (false), or set up for grid control
but don't submit any jobs (remote)
rawErrorRate=fraction-error
- The allowed difference in an overlap between two raw uncorrected reads. For lower
quality reads, use a higher number. The defaults are 0.300 for PacBio reads and
0.500 for Nanopore reads.
correctedErrorRate=fraction-error
- The allowed difference in an overlap between two corrected reads. Assemblies of
low coverage or data with biological differences will benefit from a slight increase
in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
gridOptions=string
- Pass string to the command used to submit jobs to the grid. Can be used to set
maximum run time limits. Should NOT be used to set memory limits; Canu will do
that for you.
minReadLength=number
- Ignore reads shorter than 'number' bases long. Default: 1000.
minOverlapLength=number
- Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.
A full list of options can be printed with '-options'. All options can be supplied in
an optional sepc file with the -s option.
For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any
number of haplotype-specific Illumina read files after. The {NAME} of each haplotype
is free text (but only letters and numbers, please). For example:
-haplotypeNANNY nanny/*gz
-haplotypeBILLY billy1.fasta.gz billy2.fasta.gz
Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
Reads are specified by the technology they were generated with, and any processing performed.
[processing]
-corrected
-trimmed
[technology]
-pacbio <files>
-nanopore <files>
-pacbio-hifi <files>
Complete documentation at http://canu.readthedocs.org/en/latest/
ERROR: Invalid command line option 'version'. Did you forget quotes around options with spaces?
ERROR: Assembly name prefix (-p) not supplied.
ERROR: Required parameter 'genomeSize' not set.
ERROR: Implausibly small genome size . Check units!
(/lustre/project/taw/share/conda-envs/ONRviral) [kvigil@cypress01-123 concatenate]$ canu -version
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
canu 2.2
I know this file exists because I was running it on HPC but it never finished, so I deleted the unfinished output and tried to run it again now I get an error all of a sudden.
I am going to try to re-install and see what happens.
Hi , So I ended up re-installing Canu 2.2(not conda) and re-running it and I still get the same error:
Download:
(base) [kvigil@cypress1 pkgs]$ curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.Linux-amd64.tar.xz --output canu-2.2.Linux-amd64.tar.xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 20.9M 100 20.9M 0 0 5506k 0 0:00:03 0:00:03 --:--:-- 6578k
(base) [kvigil@cypress1 pkgs]$ tar -xJf canu-2.2.*.tar.xz
confirm download:
-rw-rwxr-- 1 kvigil taw 309 Sep 29 2022 blastxWorkflow.0.3.ONR060922.barcode04.canu.medaka.config
drwxr-sr-x 5 kvigil taw 4096 Jan 25 17:40 canu-2.2
-rw-r--r-- 1 kvigil taw 21953096 Jan 25 17:40 canu-2.2.Linux-amd64.tar.xz
-rw-rwxr-- 1 kvigil taw 0 Sep 20 2021 urls
-rw-rwxr-- 1 kvigil taw 0 Sep 20 2021 urls.txt
(base) [kvigil@cypress1 bin]$ ./canu -p barcode06 -d /lustre/project/taw/kvigil/ONR/sandiago/mussel/ONR122123.121923.combined/fastq_pass/concatenate genomeSize=2m minInputCoverage=0 maxInputCoverage=0 corOutCoverage=10000 stopOnLowCoverage=0 corMhapSensitivity=high corMinCoverage=0 redMemory=32 oeaMemory=32 batMemory=32 correctedErrorRate=0.2 useGrid=false -nanopore barcode06.fastq.gz
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
usage: canu [-version] [-citation] \
[-haplotype | -correct | -trim | -assemble | -trim-assemble] \
[-s <assembly-specifications-file>] \
-p <assembly-prefix> \
-d <assembly-directory> \
genomeSize=<number>[g|m|k] \
[other-options] \
[-haplotype{NAME} illumina.fastq.gz] \
[-corrected] \
[-trimmed] \
[-pacbio |
-nanopore |
-pacbio-hifi] file1 file2 ...
example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz
To restrict canu to only a specific stage, use:
-haplotype - generate haplotype-specific reads
-correct - generate corrected reads
-trim - generate trimmed reads
-assemble - generate an assembly
-trim-assemble - generate trimmed reads and then assemble them
The assembly is computed in the -d <assembly-directory>, with output files named
using the -p <assembly-prefix>. This directory is created if needed. It is not
possible to run multiple assemblies in the same directory.
The genome size should be your best guess of the haploid genome size of what is being
assembled. It is used primarily to estimate coverage in reads, NOT as the desired
assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'
Some common options:
useGrid=string
- Run under grid control (true), locally (false), or set up for grid control
but don't submit any jobs (remote)
rawErrorRate=fraction-error
- The allowed difference in an overlap between two raw uncorrected reads. For lower
quality reads, use a higher number. The defaults are 0.300 for PacBio reads and
0.500 for Nanopore reads.
correctedErrorRate=fraction-error
- The allowed difference in an overlap between two corrected reads. Assemblies of
low coverage or data with biological differences will benefit from a slight increase
in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
gridOptions=string
- Pass string to the command used to submit jobs to the grid. Can be used to set
maximum run time limits. Should NOT be used to set memory limits; Canu will do
that for you.
minReadLength=number
- Ignore reads shorter than 'number' bases long. Default: 1000.
minOverlapLength=number
- Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.
A full list of options can be printed with '-options'. All options can be supplied in
an optional sepc file with the -s option.
For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any
number of haplotype-specific Illumina read files after. The {NAME} of each haplotype
is free text (but only letters and numbers, please). For example:
-haplotypeNANNY nanny/*gz
-haplotypeBILLY billy1.fasta.gz billy2.fasta.gz
Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
Reads are specified by the technology they were generated with, and any processing performed.
[processing]
-corrected
-trimmed
[technology]
-pacbio <files>
-nanopore <files>
-pacbio-hifi <files>
Complete documentation at http://canu.readthedocs.org/en/latest/
ERROR: Invalid command line option 'barcode06.fastq.gz'. Did you forget quotes around options with spaces?
I don't think the file exists in the folder where you're launching the command, which is the issue ls -la
would show and why I suggested it in my original reply. Run both pwd
and then ls -la
in the folder where you're launching the command and post the full output. You can provide an absolute path to the file if it's not in the local folder but your current command assumes it is.
The three Canu commands you've run all show different folders that you're launching from (canu, concatenate, and bin). I suspect the second one (where you were able to ls the file) would have worked but that was just canu version
instead of the full command.
Ok thanks! Yes! I am attempting to run a sbatch bash script for the 1st time on HPC using a job array to try to run 10 barcodes, which I usually just run one at a time and forgot I usually "cd" into where my fastq.gz file is located. Thanks it is running now! I also used useGrid=false, hopefully this works for the job array.
Thanks!
Hi I have been using this same script for a while and now I am getting this error message.
Canu 2.2 Linux ubuntu on HPC