nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

Questions: funannotate-predict.py [options] and error: unrecognized arguments #712

Open libradaatencio opened 2 years ago

libradaatencio commented 2 years ago

Are you using the latest release? I am using funannotate v1.8.9 for structural and functional annotation of a endophytic fungi. I am following the tutorial for Genome only. The assembly was cleaned, sorted and masked, I did not had problems executing this part of the pipeline.

What command did you issue? funannotate predict -i LCM1078_masked.fasta -o fun --species “Menisporopsis coffea” --strain LCM1078 --busco_seed_species neurospora_crassa --cpus 12

Describe the bug usage: funannotate-predict.py [options] -i genome.fasta funannotate-predict.py: error: unrecognized arguments: coffea”

Questions: I don’t have a specific species name, What can I do if the strain does not have a species name? I also tried “Menisporopsis sp” Should I try with a species name that already exists?

I am using the input file: LCM1078_masked.fasta (masked assembly), Should I change the name to genome.fasta to be recognized by the command?

OS/Install Information

You are running Perl v b'5.016003'. Now checking perl modules... Bio::Perl: 1.7.4 Carp: 1.26 Clone: 0.45 DBD::SQLite: 1.39 DBD::mysql: 4.023 DBI: 1.627 DB_File: 1.83 Data::Dumper: 2.145 File::Basename: 2.84 File::Which: 1.27 Getopt::Long: 2.4 Hash::Merge: 0.302 JSON: 2.59 LWP::UserAgent: 6.05 Logger::Simple: 2.0 POSIX: 1.30 Parallel::ForkManager: 2.02 Pod::Usage: 1.63 Scalar::Util::Numeric: 0.40 Storable: 2.45 Text::Soundex: 3.04 Thread::Queue: 3.02 Tie::File: 0.98 URI::Escape: 3.31 YAML: 0.84 threads: 1.87 threads::shared: 1.43 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/data/funannotate_db $TRINITY_HOME=/opt/trinityrnaseq-v2.8.6 $EVM_HOME=/opt/EVidenceModeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/opt/Augustus/config $GENEMARK_PATH=/opt/gmes_linux_64 ERROR: PASAHOME not set. export PASAHOME=/path/to/dir

Checking external dependencies... CodingQuarry: 2.0 Trinity: 2.8.6 augustus: 3.4.0 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v37x1 diamond: 2.0.13 emapper.py: 2.1.6-43-gd6e6cdf ete3: 3.1.2 exonerate: exonerate 2.2.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-12-17 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 1.8.0_302 kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.11.0+ minimap2: 2.24-r1122 proteinortho: 6.0.33 pslCDnaFilter: no way to determine salmon: salmon 1.6.0 samtools: samtools 1.11 signalp: seqfile snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.11.0+ trimal: trimAl 1.2rev59 ERROR: trimmomatic not installed

thanks for your help. Librada Atencio

hyphaltip commented 2 years ago

You don't need a valid species name that I know of - anything in the quotes should work - can you try to use single quotes? Name of the file doesn't matter

nextgenusfs commented 2 years ago

If you are submitting to a cluster sometimes quotes are stripped. On our old cluster I think I had to do something like: -s '"'Genus species'"'

hyphaltip commented 2 years ago

yeah if you are doing it through a job script you might have to play with that. This works on our slurm cluster https://github.com/stajichlab/funannotate_template/blob/main/pipeline/03_predict.sh

libradaatencio commented 2 years ago

Hello, Thanks for your help. Here I share with you the command used for funannotate predict (in a cluster) and the log file. I am working with a fungal genome assembly. The genome was sequenced using Oxford Nanopore.

[latencio@baru funannotate_12abr2022]$ nohup /usr/local/bin/funannotate predict -i LCM1078_masked.fasta -o fun --species 'Menisporopsis coffea' --strain LCM1078 --busco_seed_species neurospora_crassa --cpus 12 > funpredict.log nohup: ignoring input and redirecting stderr to stdout [latencio@baru funannotate_12abr2022]$ more funpredict.log

[Apr 19 10:00 AM]: OS: CentOS Linux 7, 20 cores, ~ 197 GB RAM. Python: 3.7.11 [Apr 19 10:00 AM]: Running funannotate v1.8.9 [Apr 19 10:00 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV [Apr 19 10:00 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[Apr 19 10:00 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Apr 19 10:00 AM]: Genome loaded: 137 scaffolds; 58,238,831 bp; 8.11% repeats masked [Apr 19 10:00 AM]: Mapping 553,202 proteins to genome using diamond and exonerate [Apr 19 10:21 AM]: Found 529,006 preliminary alignments --> aligning with exonerate Progress: 1.22% Progress: 2.46% Progress: 3.74% Progress: 4.51% Progress: 5.65% Progress: 6.61% Progress: 7.83% Progress: 8.84% Progress: 10.11% Progress: 10.88% Progress: 12.10% Progress: 13.14% Progress: 14.36%

 Progress: 16.83% 

 Progress: 19.32% 

 Progress: 21.76% 
 P
 Progress: 24.96% 
 Pr
 Progress: 26.44% 
 Pro
 Progress: 28.67% 
 Prog
 Progress: 31.19% 
 Progr
 Progress: 33.49% 
 Progre

 Progres

 Progress: 39.19% 

 Progress: 41.78% 

 Progress: 44.39% 

 Progress: 47.01% 

 Progress: 49.18% 
 P
 Progress: 51.45% 
 Pr
 Progress: 54.06% 
 Pro
 Progress: 56.34% 
 Prog
 Progress: 58.76% 
 Progr
 Progress: 61.45% 
 Progre

 Progres

 Progress: 66.86% 

 Progress: 69.28% 

 Progress: 71.72% 

 Progress: 74.16% 

 Progress: 76.80% 
 P
 Progress: 79.37% 
 Pr
 Progress: 81.48% 
 Pro
 Progress: 83.55% 
 Prog
 Progress: 85.66% 
 Progr
 Progress: 88.21% 
 Progre

 Progres

 Progress: 93.97% 

 Progress: 96.23% 

 Progress: 98.50% 

finished: found 1,485 alignments Apr 19 11:05 AM: Running GeneMark-ES on assembly perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). [Apr 19 11:24 AM]: 14,395 predictions from GeneMark [Apr 19 11:24 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Apr 19 11:33 AM]: 11 valid BUSCO predictions found, validating protein sequences [Apr 19 11:33 AM]: 11 BUSCO predictions validated [Apr 19 11:33 AM]: Not enough gene models 11 to train Augustus (200 required), exiting

nextgenusfs commented 2 years ago

augustus: 3.4.0 is incompatible with the internal BUSCO in funannotate. downgrade augustus to < 3.4.