nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

funannotate iprscan crashed into error shortly after job start #841

Closed novigit closed 1 year ago

novigit commented 2 years ago

I've been following the tutorial (https://funannotate.readthedocs.io/en/latest/tutorials.html) with my genome, and managed to get to the interproscan part. I run the command as follows:

funannotate iprscan \
     --input funannotate_out \
     --method local \
     --iprscan_path /scratch2/software/anaconda/envs/interproscan/bin/interproscan.sh

But within a minute or so, the job crashes with the following error:

Traceback (most recent call last):
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/site-packages/funannot
    shutil.rmtree(tmpdir)
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/shutil.py", line 718,
    _rmtree_safe_fd(fd, path, onerror)
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/shutil.py", line 655,
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/shutil.py", line 655,
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/shutil.py", line 659,
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/scratch2/software/anaconda/envs/funannotate/lib/python3.8/shutil.py", line 657,
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'jobMobiDBLite'

I have no idea what is causing this error.

The --iprscan_path in my command points to interproscan.sh which is installed in a separate environment, maybe funannotate is expecting them to be in the same conda environment?

Unfortunately I'm running funannotate on a cluster and apparently this means the docker option is not an option for me.

Here is the output of funannotate check --show-versions:

-------------------------------------------------------
Checking dependencies for 1.8.13
-------------------------------------------------------
You are running Python v 3.8.13. Now checking python packages...
biopython: 1.79
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.2.0
numpy: 1.23.4
pandas: 1.5.1
psutil: 5.9.4
requests: 2.28.1
scikit-learn: 1.1.3
scipy: 1.9.3
seaborn: 0.12.1
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.46
DBD::SQLite: 1.72
DBD::mysql: 4.046
DBI: 1.643
DB_File: 1.855
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.53
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.67
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.12
YAML: 1.30
threads: 2.25
threads::shared: 1.61
   ERROR: local::lib not installed, install with cpanm local::lib

Checking Environmental Variables...
$FUNANNOTATE_DB=/scratch4/db/funannotate
$PASAHOME=/scratch2/software/anaconda/envs/funannotate/opt/pasa-2.5.2
$TRINITY_HOME=/scratch2/software/anaconda/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/scratch2/software/anaconda/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/scratch2/software/anaconda/envs/funannotate/config/
$GENEMARK_PATH=/scratch2/software/gmes_linux_4.69
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
pigz 2.4
PASA: 2.5.2
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.5.0
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2021-08-25
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 17.0.3-internal
kallisto: 0.46.1
mafft: v7.508 (2022/Sep/07)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
proteinortho: 6.1.2
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.16.1
signalp: 4.1
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.11 (Oct 2022)
tantan: tantan 40
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed
        ERROR: pigz not installed

Funannotate version: funannotate 1.8.13 pyhdfd78af_0 bioconda Interproscan version (in distinct environment): interproscan 5.55_88.0 hec16e2b_1 bioconda

Cheers

nextgenusfs commented 2 years ago

With this setup I'm not sure it will work -- the funannotate iprscan is just a wrapper to run interproscan faster, but it assumes system wide functional install, ie if you need to activate that environment before interproscan will work than it won't work for you, just run it separately on the proteins in predict_results folder (or if you have RNA seq and ran update, then would be the update_results folder.

interproscan.sh -i /pathto/predict_results/genome.proteins.fasta -f XML -goterms -pa

And then pass the resulting XML output file to the --iprscan option of funannotate annotate

novigit commented 1 year ago

It seems I forgot to reply to this in several months.. My apologies!

I found a system wide functional install of interproscan and pointed to that location with --iprscan_path, and that seems to have worked.