Open melop opened 3 years ago
Thanks, I'm not sure exactly why this would happen, but perhaps related to setting the 0 weights: --weights augustus:2 pasa:10 snap:0 genemark:1 glimmerhmm:0
.
If you run the test data and add --debug, do the results have the proper parameters.json file?
funannotate test -t predict --cpus X --debug
Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here. Using 1.8.7
Describe the bug A clear and concise description of what the bug is. funannotate check is ok, but during running there are still some dependencies being unmet. I had to fix some of these issues then restart the funannotate predict pipeline, which eventually finished successfully. But the parameters.json file is empty, it only has [{}] in it.
What command did you issue? Copy/paste the command used. funannotate predict -i $GENOME \ -o trained -s "$SPECIES" --strain $STRAIN --cpus $CPU \ --max_intronlen 1000000 \ --weights augustus:2 pasa:10 snap:0 genemark:1 glimmerhmm:0 \ --optimize_augustus \ --busco_db actinopterygii \ --busco_seed_species zebrafish \ --repeats2evm \ --organism other
Logfiles Please provide relavent log files of the error.
[Aug 13 04:08 PM]: OS: Ubuntu 20.04, 128 cores, ~ 528 GB RAM. Python: 3.8.10 [Aug 13 04:08 PM]: Running funannotate v1.8.7 [Aug 13 04:08 PM]: Found training files, will re-use these files: --rna_bam trained/training/funannotate_train.coordSorted.bam --pasa_gff trained/training/funannotate_train.pasa.gff3 --transcript_alignments trained/training/funannotate_train.transcripts.gff3 [Aug 13 04:08 PM]: Skipping CodingQuarry as --organism=other. Pass a weight larger than 0 to run CQ, ie --weights codingquarry:1 [Aug 13 04:08 PM]: Parsed training data, run ab-initio gene predictors as follows: ESC[4mProgram Training-MethodESC[0m augustus pasa genemark selftraining glimmerhmm pasa snap pasa [Aug 13 04:10 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Aug 13 04:10 PM]: Genome loaded: 88 scaffolds; 465,144,529 bp; 22.64% repeats masked [Aug 13 04:10 PM]: Parsed 270,605 transcript alignments from: trained/training/funannotate_train.transcripts.gff3 [Aug 13 04:10 PM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [Aug 13 04:10 PM]: Existing RNA-seq BAM hints found: trained/predict_misc/hints.BAM.gff [Aug 13 04:10 PM]: Existing protein alignments found: trained/predict_misc/protein_alignments.gff3 [Aug 13 04:12 PM]: Existing GeneMark annotation found: trained/predict_misc/genemark.gff [Aug 13 04:12 PM]: 36,851 predictions from GeneMark [Aug 13 04:12 PM]: Existing Augustus annotations found: trained/predict_misc/augustus.gff3 [Aug 13 04:12 PM]: Pulling out high quality Augustus predictions [Aug 13 04:12 PM]: Found 16,658 high quality predictions from Augustus (>90% exon evidence) [Aug 13 04:12 PM]: Skipping snap prediction as weight set to 0 [Aug 13 04:12 PM]: Skipping GlimmerHMM prediction as weight set to 0 [Aug 13 04:12 PM]: Summary of gene models passed to EVM (weights): [Aug 13 04:12 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval Progress: 0.00% Progress: 99.90% Aug 13 04:31 PM]: Converting to GFF3 and collecting all EVM results ESC[4mSource Weight CountESC[0m Augustus 2 9430 Augustus HiQ 2 16658 GeneMark 1 36851 pasa 10 33085 Total - 96024 [Aug 13 04:31 PM]: 26,517 total gene models from EVM [Aug 13 04:31 PM]: Generating protein fasta files from 26,517 EVM models [Aug 13 04:31 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Aug 13 04:31 PM]: Found 154 gene models to remove: 5 too short; 0 span gaps; 149 transposable elements [Aug 13 04:31 PM]: 26,363 gene models remaining [Aug 13 04:31 PM]: Predicting tRNAs [Aug 13 05:14 PM]: 9,709 tRNAscan models are valid (non-overlapping) [Aug 13 05:14 PM]: Generating GenBank tbl annotation file [Aug 13 05:14 PM]: Converting to final Genbank format [Aug 13 05:16 PM]: Collecting final annotation files for 36,072 total gene models [Aug 13 05:16 PM]: Funannotate predict is finished, output files are in the trained/predict_results folder [Aug 13 05:16 PM]: Your next step to capture UTRs and update annotation using PASA:
funannotate update -i trained --cpus 64
[Aug 13 05:16 PM]: Training parameters file saved: trained/predict_results/macropodus_opercularis_dsy2021.parameters.json [Aug 13 05:16 PM]: Add species parameters to database:
funannotate species -s macropodus_opercularis_dsy2021 -a trained/predict_results/macropodus_opercularis_dsy2021.parameters.json
OS/Install Information
output of
funannotate check --show-versions
Checking dependencies for 1.8.7
You are running Python v 3.8.10. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.2 natsort: 7.1.1 numpy: 1.21.1 pandas: 1.2.4 psutil: 5.7.0 requests: 2.22.0 scikit-learn: 0.24.2 scipy: 1.6.3 seaborn: 0.11.1 All 11 python packages installed
You are running Perl v b'5.030000'. Now checking perl modules... Bio::Perl: 1.7.4 Carp: 1.50 Clone: 0.45 DBD::SQLite: 1.68 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.856 Data::Dumper: 2.174 File::Basename: 2.85 File::Which: 1.27 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.03 LWP::UserAgent: 6.43 Logger::Simple: 2.0 POSIX: 1.88 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.30 threads: 2.22 threads::shared: 1.6 All 27 Perl modules installed Checking Environmental Variables... $FUNANNOTATE_DB=/data/software/funannotate/funannotate_db/ $PASAHOME=/data/software/PASApipeline.v2.4.1/ $TRINITYHOME=/data/software/trinityrnaseq-v2.12.0/ $EVM_HOME=/data/software/EVidenceModeler-1.1.1/ $AUGUSTUS_CONFIG_PATH=/data/software/augustus-3.4.0/config $GENEMARK_PATH=/data/software/funannotate/gmes_linux_64/ All 6 environmental variables are set
Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.12.0 augustus: 3.4.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.27.1 blat: BLAT v37x1 diamond: 0.9.30.131 emapper.py: 2.1.4-2 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2020-04-08 gmes_petap.pl: 4.68_lic hisat2: 2.1.0 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.11 kallisto: 0.46.1 mafft: v7.453 (2019/Nov/8) makeblastdb: makeblastdb 2.11.0+ minimap2: 2.20-r1061 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.12.0 samtools: samtools 1.10 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.6 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.11.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed