nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

parameters.json empty after resuming pipeline from error #628

Open melop opened 3 years ago

melop commented 3 years ago

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here. Using 1.8.7

Describe the bug A clear and concise description of what the bug is. funannotate check is ok, but during running there are still some dependencies being unmet. I had to fix some of these issues then restart the funannotate predict pipeline, which eventually finished successfully. But the parameters.json file is empty, it only has [{}] in it.

What command did you issue? Copy/paste the command used. funannotate predict -i $GENOME \ -o trained -s "$SPECIES" --strain $STRAIN --cpus $CPU \ --max_intronlen 1000000 \ --weights augustus:2 pasa:10 snap:0 genemark:1 glimmerhmm:0 \ --optimize_augustus \ --busco_db actinopterygii \ --busco_seed_species zebrafish \ --repeats2evm \ --organism other

Logfiles Please provide relavent log files of the error.

[Aug 13 04:08 PM]: OS: Ubuntu 20.04, 128 cores, ~ 528 GB RAM. Python: 3.8.10 [Aug 13 04:08 PM]: Running funannotate v1.8.7 [Aug 13 04:08 PM]: Found training files, will re-use these files: --rna_bam trained/training/funannotate_train.coordSorted.bam --pasa_gff trained/training/funannotate_train.pasa.gff3 --transcript_alignments trained/training/funannotate_train.transcripts.gff3 [Aug 13 04:08 PM]: Skipping CodingQuarry as --organism=other. Pass a weight larger than 0 to run CQ, ie --weights codingquarry:1 [Aug 13 04:08 PM]: Parsed training data, run ab-initio gene predictors as follows: ESC[4mProgram Training-MethodESC[0m augustus pasa genemark selftraining glimmerhmm pasa snap pasa [Aug 13 04:10 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Aug 13 04:10 PM]: Genome loaded: 88 scaffolds; 465,144,529 bp; 22.64% repeats masked [Aug 13 04:10 PM]: Parsed 270,605 transcript alignments from: trained/training/funannotate_train.transcripts.gff3 [Aug 13 04:10 PM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [Aug 13 04:10 PM]: Existing RNA-seq BAM hints found: trained/predict_misc/hints.BAM.gff [Aug 13 04:10 PM]: Existing protein alignments found: trained/predict_misc/protein_alignments.gff3 [Aug 13 04:12 PM]: Existing GeneMark annotation found: trained/predict_misc/genemark.gff [Aug 13 04:12 PM]: 36,851 predictions from GeneMark [Aug 13 04:12 PM]: Existing Augustus annotations found: trained/predict_misc/augustus.gff3 [Aug 13 04:12 PM]: Pulling out high quality Augustus predictions [Aug 13 04:12 PM]: Found 16,658 high quality predictions from Augustus (>90% exon evidence) [Aug 13 04:12 PM]: Skipping snap prediction as weight set to 0 [Aug 13 04:12 PM]: Skipping GlimmerHMM prediction as weight set to 0 [Aug 13 04:12 PM]: Summary of gene models passed to EVM (weights): [Aug 13 04:12 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval Progress: 0.00% Progress: 99.90% Aug 13 04:31 PM]: Converting to GFF3 and collecting all EVM results ESC[4mSource Weight CountESC[0m Augustus 2 9430 Augustus HiQ 2 16658 GeneMark 1 36851 pasa 10 33085 Total - 96024 [Aug 13 04:31 PM]: 26,517 total gene models from EVM [Aug 13 04:31 PM]: Generating protein fasta files from 26,517 EVM models [Aug 13 04:31 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Aug 13 04:31 PM]: Found 154 gene models to remove: 5 too short; 0 span gaps; 149 transposable elements [Aug 13 04:31 PM]: 26,363 gene models remaining [Aug 13 04:31 PM]: Predicting tRNAs [Aug 13 05:14 PM]: 9,709 tRNAscan models are valid (non-overlapping) [Aug 13 05:14 PM]: Generating GenBank tbl annotation file [Aug 13 05:14 PM]: Converting to final Genbank format [Aug 13 05:16 PM]: Collecting final annotation files for 36,072 total gene models [Aug 13 05:16 PM]: Funannotate predict is finished, output files are in the trained/predict_results folder [Aug 13 05:16 PM]: Your next step to capture UTRs and update annotation using PASA:

funannotate update -i trained --cpus 64

[Aug 13 05:16 PM]: Training parameters file saved: trained/predict_results/macropodus_opercularis_dsy2021.parameters.json [Aug 13 05:16 PM]: Add species parameters to database:

funannotate species -s macropodus_opercularis_dsy2021 -a trained/predict_results/macropodus_opercularis_dsy2021.parameters.json

OS/Install Information

You are running Perl v b'5.030000'. Now checking perl modules... Bio::Perl: 1.7.4 Carp: 1.50 Clone: 0.45 DBD::SQLite: 1.68 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.856 Data::Dumper: 2.174 File::Basename: 2.85 File::Which: 1.27 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.03 LWP::UserAgent: 6.43 Logger::Simple: 2.0 POSIX: 1.88 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.30 threads: 2.22 threads::shared: 1.6 All 27 Perl modules installed Checking Environmental Variables... $FUNANNOTATE_DB=/data/software/funannotate/funannotate_db/ $PASAHOME=/data/software/PASApipeline.v2.4.1/ $TRINITYHOME=/data/software/trinityrnaseq-v2.12.0/ $EVM_HOME=/data/software/EVidenceModeler-1.1.1/ $AUGUSTUS_CONFIG_PATH=/data/software/augustus-3.4.0/config $GENEMARK_PATH=/data/software/funannotate/gmes_linux_64/ All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.12.0 augustus: 3.4.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.27.1 blat: BLAT v37x1 diamond: 0.9.30.131 emapper.py: 2.1.4-2 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2020-04-08 gmes_petap.pl: 4.68_lic hisat2: 2.1.0 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.11 kallisto: 0.46.1 mafft: v7.453 (2019/Nov/8) makeblastdb: makeblastdb 2.11.0+ minimap2: 2.20-r1061 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.12.0 samtools: samtools 1.10 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.6 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.11.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed

nextgenusfs commented 3 years ago

Thanks, I'm not sure exactly why this would happen, but perhaps related to setting the 0 weights: --weights augustus:2 pasa:10 snap:0 genemark:1 glimmerhmm:0.

If you run the test data and add --debug, do the results have the proper parameters.json file?

funannotate test -t predict --cpus X --debug