nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

tbl2asn fatal error #733

Closed DrPintoThe2nd closed 2 years ago

DrPintoThe2nd commented 2 years ago

Hi Jon,

Sorry to bother you with this, but I'm stuck on the last step of a 'funannotate predict' where tbl2asn is hitting a fatal error. I've tried about different versions of tbl2asn in the docker I'm using to no avail.. fortunately, the test dataset reproduces this error verbatim! Have you run into this before? Any ideas to get around it?

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here.

[Jun 13 06:55 PM]: OS: Debian GNU/Linux 11, 48 cores, ~ 132 GB RAM. Python: 3.8.12
[Jun 13 06:55 PM]: Running funannotate v1.8.9

Describe the bug

[Jun 13 07:29 PM]: ERROR: tbl2asn also failed in single threaded mode...

What command did you issue?

funannotate test -t predict --cpus 42
#########################################################
Running `funannotate predict` unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 42 --species Awesome testicus
#########################################################

Logfiles

#########################################################
Running `funannotate predict` unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 42 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Jun 13 06:55 PM]: OS: Debian GNU/Linux 11, 48 cores, ~ 132 GB RAM. Python: 3.8.12
[Jun 13 06:55 PM]: Running funannotate v1.8.9
[Jun 13 06:55 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Jun 13 06:55 PM]: Skipping CodingQuarry as no --rna_bam passed
[Jun 13 06:55 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Jun 13 06:56 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 13 06:56 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Jun 13 06:56 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Jun 13 06:56 PM]: Found 1,782 preliminary alignments --> aligning with exonerate
[Jun 13 06:57 PM]: Exonerate finished: found 1,417 alignments
[Jun 13 06:57 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Jun 13 06:59 PM]: 373 valid BUSCO predictions found, validating protein sequences
[Jun 13 07:00 PM]: 370 BUSCO predictions validated
[Jun 13 07:00 PM]: Running Augustus gene prediction using saccharomyces parameters
[Jun 13 07:02 PM]: 1,494 predictions from Augustus
[Jun 13 07:02 PM]: Pulling out high quality Augustus predictions
[Jun 13 07:02 PM]: Found 369 high quality predictions from Augustus (>90% exon evidence)
[Jun 13 07:02 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 13 07:03 PM]: 1,360 predictions from SNAP
[Jun 13 07:03 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 13 07:04 PM]: 1,773 predictions from GlimmerHMM
[Jun 13 07:04 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1336 
  Augustus HiQ   2        370  
  GlimmerHMM     1        1773 
  snap           1        1360 
  Total          -        4839 
[Jun 13 07:04 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Jun 13 07:27 PM]: Converting to GFF3 and collecting all EVM results
[Jun 13 07:27 PM]: 1,700 total gene models from EVM
[Jun 13 07:27 PM]: Generating protein fasta files from 1,700 EVM models
[Jun 13 07:27 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Jun 13 07:28 PM]: Found 142 gene models to remove: 0 too short; 0 span gaps; 142 transposable elements
[Jun 13 07:28 PM]: 1,558 gene models remaining
[Jun 13 07:28 PM]: Predicting tRNAs
[Jun 13 07:28 PM]: 112 tRNAscan models are valid (non-overlapping)
[Jun 13 07:28 PM]: Generating GenBank tbl annotation file
[Jun 13 07:28 PM]: Collecting final annotation files for 1,670 total gene models
[Jun 13 07:28 PM]: Converting to final Genbank format
[Jun 13 07:29 PM]: ERROR: GBK file conversion failed, tbl2asn parallel script has died
[Jun 13 07:29 PM]: Trying single threaded tbl2asn as backup
[Jun 13 07:29 PM]: CMD: tbl2asn -y "Annotated using 1.8.9" -N 1 -t /opt/conda/envs/funannotate/lib/python3.8/site-packages/funannotate/config/test.sbt -M n -j "[organism=Awesome testicus]" -V b -c f -T -a r10u -p annotate/predict_misc/tbl2asn
[Jun 13 07:29 PM]: ERROR: tbl2asn also failed in single threaded mode, check tbl2asn installation/compilation
#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.9
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.77
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.22.3
pandas: 1.4.2
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.8.0
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
   ERROR: Bio::Perl not installed, install with cpanm Bio::Perl 

Checking Environmental Variables...
$FUNANNOTATE_DB=/home/funannotate_db
$PASAHOME=/opt/conda/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/opt/conda/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/opt/conda/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/opt/conda/envs/funannotate/config/
    ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
salmon: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/opt/conda/envs/funannotate/bin/ete3", line 6, in <module>
    from ete3.tools.ete import main
  File "/opt/conda/envs/funannotate/lib/python3.8/site-packages/ete3/tools/ete.py", line 55, in <module>
    from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view,
  File "/opt/conda/envs/funannotate/lib/python3.8/site-packages/ete3/tools/ete_view.py", line 48, in <module>
    from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace,
ImportError: cannot import name 'TextFace' from 'ete3' (/opt/conda/envs/funannotate/lib/python3.8/site-packages/ete3/__init__.py)
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.14
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.9.1-internal
kallisto: 0.46.1
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
proteinortho: 6.0.33
pslCDnaFilter: no way to determine
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: emapper.py not installed
    ERROR: ete3 not installed
    ERROR: gmes_petap.pl not installed
    ERROR: salmon not installed
    ERROR: signalp not installed

Dockerfile

FROM continuumio/miniconda3:latest

RUN conda config --add channels defaults && conda config --add channels bioconda && \
    conda config --add channels conda-forge && conda update -n base -c defaults conda

RUN conda install mamba

RUN mamba create -n funannotate --yes "funannotate=1.8.9" && conda clean -afy

ENV FUNANNOTATE_DB=/home/funannotate_db

RUN mkdir -p /home/funannotate_db

RUN echo "source activate funannotate" > ~/.bashrc

SHELL ["conda", "run", "-n", "funannotate", "/bin/bash", "-c"]

RUN funannotate setup -i all

RUN funannotate setup -b tetrapoda dikarya
nextgenusfs commented 2 years ago

It probably is coming from tbl2asn installed via conda -- tbl2asn has this really strange compilation timer thingy that tells user to update as it is "out of date" after a year or something, perhaps that is why it is failing? Also -- docker images built from gitlab actions are here if that is helpful to you: https://hub.docker.com/r/nextgenusfs/funannotate/tags, I can add tetrapoda to the busco database so its installed in the image.

You can test by running that tbl2asn command manually:

$ docker run yourimagename tbl2asn -y "Annotated using 1.8.9" -N 1 -t /opt/conda/envs/funannotate/lib/python3.8/site-packages/funannotate/config/test.sbt -M n -j "[organism=Awesome testicus]" -V b -c f -T -a r10u -p annotate/predict_misc/tbl2asn

Do you get the same error if you use the docker image: nextgenusfs/funannotate:v1.8.9?

You could probably fix by just downloading tbl2asn from NCBI in your docker file and adding to PATH, if you add it in front of conda than it will take precedence otherwise can remove

DrPintoThe2nd commented 2 years ago

ish.. yeah, the nextgenusfs/funannotate:v1.8.9 docker doesn't hit this issue, but for some otherworldly reason I can't use that docker on my workstation(!), but it works fine on my desktop. I hoped it over to finish the annotation and moved on! I think this may be a sign that my workstation is nearing retirement! :(

(Also, adding tetrapoda to the download would be helpful for me personally, but no sweat if it's not worth it or you don't have the time!)

Thanks for your time and suggestions!