nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

trainGlimmerHMM missing Getopt::Std Perl module in latest Docker container (run via Singularity) #702

Closed Shellfishgene closed 2 years ago

Shellfishgene commented 2 years ago

Are you using the latest release? Yes, from Docker: singularity pull funannotate.sif docker://nextgenusfs/funannotate:latest

Describe the bug I just ran the funnannotate test using Singularity, but trainGlimmerHMM fails due to the Getop/Std Perl module missing. Running trainGlimmerHMM directly from the shell inside the container also results in this error. I also tried installing GlimmerHMM from conda to check if the error comes from the conda recipe for Glimmer, but the following works fine here:

mamba create -n glimmerhmm glimmerhmm
trainGlimmerHMM

So the Conda recipe seems fine?

What command did you issue? singularity exec docker://nextgenusfs/funannotate funannotate test -t all --cpus 4 or singularity exec funannotate.sif trainGlimmerHMM

Logfiles

[Mar 08 03:48 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3                                                 
[Mar 08 03:48 PM]: CMD ERROR: trainGlimmerHMM /gxfs_work1/fs2/work-geomar7/smomw240/temp/test-predict_0daf1122-bd37-443b-b872-68400439689e/annotate/predict_misc/ge
nome.softmasked.fa /gxfs_work1/fs2/work-geomar7/smomw240/temp/test-predict_0daf1122-bd37-443b-b872-68400439689e/annotate/predict_misc/glimmer.exons -d annotate/pre
dict_misc/glimmerhmm                                                                                                                                               
b"Can't locate Getopt/Std.pm in @INC (you may need to install the Getopt::Std module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.28.1 /usr/lo
cal/share/perl/5.28.1 /usr/lib/x86_64-linux-gnu/perl5/5.28 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.28 /usr/share/perl/5.28 /usr/local/lib/site_perl /usr$
lib/x86_64-linux-gnu/perl-base) at /venv/bin/trainGlimmerHMM line 8.\nBEGIN failed--compilation aborted at /venv/bin/trainGlimmerHMM line 8.\n"
#########################################################
Traceback (most recent call last):                              
  File "/venv/bin/funannotate", line 8, in <module> 
    sys.exit(main())                                          
  File "/venv/lib/python3.8/site-packages/funannotate/funannotate.py", line 711, in main
    mod.main(arguments)                        
  File "/venv/lib/python3.8/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)                                                      
  File "/venv/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest
    assert 1500 <= countGFFgenes(os.path.join(                               
  File "/venv/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_0daf1122-bd37-443b-b872-68400439689e/annotate/predict_results/Awesome_testicus.gff3'

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.10
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.77
goatools: 1.1.12
matplotlib: 3.5.1
natsort: 8.1.0
numpy: 1.22.2
pandas: 1.4.1
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.5.3
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.42
DBD::SQLite: 1.70
DBD::mysql: 4.046
DBI: 1.643
DB_File: 1.855
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.52
Hash::Merge: 0.302
JSON: 4.05
LWP::UserAgent: 6.61
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.10
YAML: 1.30
local::lib: 2.000028
threads: 2.25
threads::shared: 1.61
   ERROR: Bio::Perl not installed, install with cpanm Bio::Perl 

Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/venv/config
    ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.14
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.9.1-internal
kallisto: 0.46.1
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.6
proteinortho: 6.0.16
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.15
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: emapper.py not installed
    ERROR: gmes_petap.pl not installed
    ERROR: signalp not installed
nextgenusfs commented 2 years ago

Weird, apparently adding local-lib caused the glimmerhmm perl error... I think the current is fixed, can you confirm fixed?

$ docker images
REPOSITORY                     TAG       IMAGE ID       CREATED         SIZE
nextgenusfs/funannotate        latest    23522b878a09   4 hours ago     10.8GB
$ ./funannotate_dev/funannotate-docker test -t predict annotate --cpus 6
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 6 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Mar 27 02:25 AM]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.12
[Mar 27 02:25 AM]: Running funannotate v1.8.10
[Mar 27 02:25 AM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Mar 27 02:25 AM]: Skipping CodingQuarry as no --rna_bam passed
[Mar 27 02:25 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Mar 27 02:25 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Mar 27 02:25 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Mar 27 02:25 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Mar 27 02:25 AM]: Found 1,505 preliminary alignments with diamond in 0:00:02 --> generated FASTA files for exonerate in 0:00:00
[Mar 27 02:26 AM]: Exonerate finished in 0:00:32: found 1,270 alignments
[Mar 27 02:26 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Mar 27 02:40 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Mar 27 02:41 AM]: 370 BUSCO predictions validated
[Mar 27 02:41 AM]: Running Augustus gene prediction using saccharomyces parameters
[Mar 27 02:44 AM]: 1,485 predictions from Augustus
[Mar 27 02:44 AM]: Pulling out high quality Augustus predictions
[Mar 27 02:44 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Mar 27 02:44 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Mar 27 02:47 AM]: 1,508 predictions from SNAP
[Mar 27 02:47 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Mar 27 02:50 AM]: 1,774 predictions from GlimmerHMM
[Mar 27 02:50 AM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GlimmerHMM     1        1774 
  snap           1        1508 
  Total          -        4979 
[Mar 27 02:50 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Mar 27 02:59 AM]: Converting to GFF3 and collecting all EVM results
[Mar 27 02:59 AM]: 1,688 total gene models from EVM
[Mar 27 02:59 AM]: Generating protein fasta files from 1,688 EVM models
[Mar 27 02:59 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Mar 27 02:59 AM]: Found 135 gene models to remove: 0 too short; 0 span gaps; 135 transposable elements
[Mar 27 02:59 AM]: 1,553 gene models remaining
[Mar 27 02:59 AM]: Predicting tRNAs
[Mar 27 02:59 AM]: 112 tRNAscan models are valid (non-overlapping)
[Mar 27 02:59 AM]: Generating GenBank tbl annotation file
[Mar 27 02:59 AM]: Collecting final annotation files for 1,665 total gene models
[Mar 27 02:59 AM]: Converting to final Genbank format
[Mar 27 03:00 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Mar 27 03:00 AM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 6

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 6 --sbt yourSBTfile.txt
-------------------------------------------------------

[Mar 27 03:00 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Mar 27 03:00 AM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################
Shellfishgene commented 2 years ago

Sorry for the delay, yes this fixed it for me.

tjhinet commented 1 year ago

Hi,

I am fairly new to this forum, so pardon me if I'm not supposed to comment on a closed issue. I encountered a similar issue as described above checking for SignalP function when I was checking dependencies. This is based on the latest docker image.

Logfile


Checking dependencies for 1.8.14

You are running Python v 3.8.12. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.7.0 natsort: 8.2.0 numpy: 1.22.4 pandas: 1.5.3 psutil: 5.9.4 requests: 2.28.2 scikit-learn: 1.1.1 scipy: 1.5.3 seaborn: 0.12.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000029 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config $GENEMARK_PATH=/home/u4485090/funannotate/gmes_linux_64_4 All 6 environmental variables are set

Checking external dependencies... ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

ERROR: signalp found but error running signalp

PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.2 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.515 (2023/Jan/15) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.6 proteinortho: 6.0.16 salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: pslCDnaFilter not installed ERROR: signalp not installed Singularity> signalp Can't locate Getopt/Std.pm in @INC (you may need to install the Getopt::Std module) (@INC contains: /home/u4485090/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/u4485090/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.28.1 /usr/local/share/perl/5.28.1 /usr/lib/x86_64-linux-gnu/perl5/5.28 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.28 /usr/share/perl/5.28 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /home/u4485090/funannotate/signalp-4.1/signalp line 76. BEGIN failed--compilation aborted at /home/u4485090/funannotate/signalp-4.1/signalp line 76.

Any help is appreciated! Thanks.

Cheers, Erick