nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 84 forks source link

snap & glimmerhmm trainings fail, then evidence modeler fails for similar reason #538

Closed danielardaniela closed 3 years ago

danielardaniela commented 3 years ago

Are you using the latest release? v1.8.4 , all dependencies met except GeneMark

Describe the bug During the predict step funannotate all is well until snap & glimmerhmm step. Both trainings fail due to empty training sets. Subsequently, evidence modelere fails due to missing file. It seems that some output files are just not being generated although everything is properly installed.

What command did you issue?

funannotate predict -i ../hvir_mask.fa -o fun_predict --species hvir \
--transcript_evidence trinity.fasta --rna_bam trinity.alignments.bam --pasa_gff hydra_db.pasa_assemblies.gff3 --cpus 3 --augustus_gff hvir_r06.all.recounted.gff

Logfiles

[01/27/21 14:17:11]: OS: MacOSX 10.15.7, 4 cores, ~ 8 GB RAM. Python: 3.7.3
[01/27/21 14:17:11]: Running funannotate v1.8.4
[01/27/21 14:17:11]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[01/27/21 14:17:13]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 2, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[01/27/21 14:17:13]: {'augustus': 'pasa', 'snap': 'pasa', 'glimmerhmm': 'pasa', 'codingquarry': 'rna-bam'}
[01/27/21 14:17:13]: Parsed training data, run ab-initio gene predictors as follows:
[01/27/21 14:17:17]: perl /usr/local/EVidenceModeler/EvmUtils/gff3_gene_prediction_file_validator.pl /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/pasa_predictions.gff3
[01/27/21 14:17:19]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 2, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[01/27/21 14:19:22]: Loading genome assembly and parsing soft-masked repetitive sequences
[01/27/21 14:21:13]: Genome loaded: 2,677 scaffolds; 284,265,305 bp; 56.05% repeats masked
[01/27/21 14:21:14]: Existing transcript alignments found: fun_predict/predict_misc/transcript_alignments.gff3
[01/27/21 14:21:14]: Existing RNA-seq BAM hints found: fun_predict/predict_misc/hints.BAM.gff
[01/27/21 14:21:28]: Existing protein alignments found: fun_predict/predict_misc/protein_alignments.gff3
[01/27/21 14:22:00]: /usr/local/augustus/scripts/join_mult_hints.pl
[01/27/21 14:22:11]: perl /usr/local/EVidenceModeler/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl hvir_r06.all.recounted.gff
[01/27/21 14:22:27]: Pulling out high quality Augustus predictions
[01/27/21 14:22:32]: Found 10,534 high quality predictions from Augustus (>90% exon evidence)
[01/27/21 14:22:34]: Using existing CodingQuarry results: fun_predict/predict_misc/coding_quarry.gff3
[01/27/21 14:22:34]: 52,115 predictions from CodingQuarry
[01/27/21 14:22:34]: Directory not copied. Error: [Errno 17] File exists: 'fun_predict/predict_misc/ab_initio_parameters/codingquarry'
[01/27/21 14:22:34]: Snap training failed, empty training set: fun_predict/predict_misc/final_training_models.gff3
[01/27/21 14:22:34]: snap failed removing from training parameters
[01/27/21 14:22:34]: GlimmerHMM training failed, empty training set: fun_predict/predict_misc/final_training_models.gff3
[01/27/21 14:22:34]: GlimmerHMM failed, removing from training parameters
[01/27/21 14:22:36]: Prediction sources: ['Augustus', 'CodingQuarry', 'pasa']
[01/27/21 14:22:39]: Summary of gene models: {'total': 78094, 'Augustus': 25979, 'CodingQuarry': 52115}
[01/27/21 14:22:39]: EVM Weights: {'Augustus': 1, 'CodingQuarry': 2, 'pasa': 6, 'proteins': 1, 'transcripts': 1}
[01/27/21 14:22:39]: Summary of gene models passed to EVM (weights):
[01/27/21 14:22:39]: Launching EVM via funannotate-runEVM.py
[01/27/21 14:22:39]: /Users/danielardaniela/.pyenv/versions/3.7.3/bin/python /Users/danielardaniela/.pyenv/versions/3.7.3/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py -w /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/weights.evm.txt -c 3 -g /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/gene_predictions.gff3 -d /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/EVM -f /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/genome.softmasked.fa -l fun_predict/logfiles/funannotate-EVM.log -m 10 -o /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/evm.round1.gff3 --EVM_HOME /usr/local/EVidenceModeler -p /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/protein_alignments.gff3 -t /Users/danielardaniela/Desktop/pasa2/fun_predict/predict_misc/transcript_alignments.gff3
[01/27/21 14:23:03]: Evidence modeler has failed, exiting

OS/Install Information

 funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.4
-------------------------------------------------------
You are running Python v 3.7.3. Now checking python packages...
biopython: 1.78
goatools: 1.0.15
matplotlib: 3.3.3
natsort: 7.1.0
numpy: 1.19.5
pandas: 1.2.0
psutil: 5.8.0
requests: 2.25.1
scikit-learn: 0.24.1
scipy: 1.6.0
seaborn: 0.11.1
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.7.4
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.49
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/Users/danielardaniela/funannotate_db
$PASAHOME=/usr/local/PASApipeline
$TRINITY_HOME=/usr/local/trinityrnaseq
$EVM_HOME=/usr/local/EVidenceModeler
$AUGUSTUS_CONFIG_PATH=/usr/local/augustus/config
    ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
/Users/danielardaniela/miniconda3/lib/python3.8/site-packages/ete3-3.1.2-py3.7.egg/ete3/evol/parser/codemlparser.py:221: SyntaxWarning: "is" with a literal. Did you mean "=="?
/Users/danielardaniela/miniconda3/lib/python3.8/site-packages/ete3-3.1.2-py3.7.egg/ete3/evol/parser/codemlparser.py:221: SyntaxWarning: "is" with a literal. Did you mean "=="?
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.11.0
augustus: 3.2.1
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.6
emapper.py: /Users/danielardaniela/miniconda3/bin/diamond /Users/danielardaniela/miniconda3/envs/funannotate/lib/python2.7/site-packages
emapper-2.0.1

ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2020-10-14
hisat2: 2.2.0
hmmscan: HMMER 3.3.1 (Jul 2020)
hmmsearch: HMMER 3.3.1 (Jul 2020)
java: 15.0.1
kallisto: 0.46.2
mafft: v7.475 (2020/Nov/23)
makeblastdb: makeblastdb 2.6.0+
minimap2: 2.17-r941
proteinortho: 6.0.27
pslCDnaFilter: no way to determine
salmon: salmon 1.3.0
samtools: samtools 1.11
signalp: 4.1
snap: 2006-07-28
stringtie: 2.1.4
tRNAscan-SE: 2.0.7 (Oct 2020)
tantan: tantan 13
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.6.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: gmes_petap.pl not installed
nextgenusfs commented 3 years ago

I can't quite tell but my guess is the file you are passing to --pasa_gff has not been run through transdecoder and is perhaps the GFF file that references the transcript assemblies and not one that references to the genome?

Have you tried to run through the recommended protocol of 'funannotate train' first to let it handle getting the appropriate files?

danielardaniela commented 3 years ago

I can't quite tell but my guess is the file you are passing to --pasa_gff has not been run through transdecoder and is perhaps the GFF file that references the transcript assemblies and not one that references to the genome?

Have you tried to run through the recommended protocol of 'funannotate train' first to let it handle getting the appropriate files?

I have now run the 'funannotate train' step and run 'funannotate predict' as instructed at the end of the funannotate train step as follows: funannotate predict -i hvir_mask.fa -o fun_predict6 -s "hvir" --cpus 1

This has produced new errors which are as follows: Traceback (most recent call last): File "/home/russod/.local/bin/funannotate", line 713, in <module> main() File "/home/russod/.local/bin/funannotate", line 703, in main mod.main(arguments) File "/home/russod/.local/lib/python3.6/site-packages/funannotate/predict.py", line 572, in main augustus_version, augustus_functional = lib.checkAugustusFunc() File "/home/russod/.local/lib/python3.6/site-packages/funannotate/library.py", line 1054, in checkAugustusFunc stdout, stderr = proc.communicate() File "/usr/lib/python3.6/subprocess.py", line 863, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.6/subprocess.py", line 1574, in _communicate self.stdout.errors) File "/usr/lib/python3.6/subprocess.py", line 760, in _translate_newlines data = data.decode(encoding, errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 168: ordinal not in range(128)

Are there any ways to fix these errors?

nextgenusfs commented 3 years ago

What version of Augustus are you running and on which OS? If you are running on OSX then the only version of Augustus I've been able to get working is this one: https://github.com/nextgenusfs/augustus. I thought these issues were resolved, it is unable to decode a character in the augustus output, super annoying python unicode/decode errors.....

danielardaniela commented 3 years ago

What version of Augustus are you running and on which OS? If you are running on OSX then the only version of Augustus I've been able to get working is this one: https://github.com/nextgenusfs/augustus. I thought these issues were resolved, it is unable to decode a character in the augustus output, super annoying python unicode/decode errors.....

I am running Augustus 3.4.0 on a WSL2