Closed caonetto closed 1 year ago
This seems like augutus is failing in training step - can you check the setup and report what augustus version is installed?
funannotate check --show-versions
The file its trying to parse is predict_misc/augustus.initial.training.txt
, which appears to be corrupt or empty perhaps?
Thanks for your quick response. I managed to fix the issue by doing a fresh conda install of funannotate, removed the included augusuts, then installed augustus 3.5 from conda and updated the funannotate scripts using git.
Great. Can you confirm that with this conda setup that all of the tests from funannotate test
pass? I'm still trying to get a version of augustus v3.5 working on my Mac (failing so far), so I'm not sure if everything is working in linux (I don't want to update the docker image until I know its safe to do so).
Hi, Just run funannotate test and it all seems to have completed succesfully.
Cheers.
(funannotate) cris@cris-biosciences:~$ funannotate test -t all --cpus 12
#########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
minimap2 version=2.24-r1122 path=/scratch/anaconda3/envs/funannotate/bin/minimap2
-----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################
#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 12
#########################################################
-------------------------------------------------------
[Oct 21 03:09 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:09 PM]: Running funanotate v1.8.14
[Oct 21 03:09 PM]: Soft-masking simple repeats with tantan
[Oct 21 03:09 PM]: Repeat soft-masking finished:
Masked genome: /home/cris/test-mask_11cab9f7-523e-43cd-b60d-eb0faa16bc13/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 12 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Oct 21 03:09 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:09 PM]: Running funannotate v1.8.14
[Oct 21 03:09 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:09 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
glimmerhmm busco
snap busco
[Oct 21 03:09 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:09 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:09 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:09 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:09 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:09 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 21 03:13 PM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 21 03:14 PM]: 367 BUSCO predictions validated
[Oct 21 03:14 PM]: Running Augustus gene prediction using saccharomyces parameters
[Oct 21 03:15 PM]: 1,485 predictions from Augustus
[Oct 21 03:15 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:15 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:15 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:15 PM]: 1,532 predictions from SNAP
[Oct 21 03:15 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:16 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:16 PM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 1325
Augustus HiQ 2 372
GlimmerHMM 1 1777
snap 1 1532
Total - 5006
[Oct 21 03:16 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:19 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:19 PM]: 1,699 total gene models from EVM
[Oct 21 03:19 PM]: Generating protein fasta files from 1,699 EVM models
[Oct 21 03:19 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:19 PM]: Found 135 gene models to remove: 0 too short; 0 span gaps; 135 transposable elements
[Oct 21 03:19 PM]: 1,564 gene models remaining
[Oct 21 03:19 PM]: Predicting tRNAs
[Oct 21 03:19 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:19 PM]: Generating GenBank tbl annotation file
[Oct 21 03:19 PM]: Collecting final annotation files for 1,676 total gene models
[Oct 21 03:19 PM]: Converting to final Genbank format
[Oct 21 03:19 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Oct 21 03:19 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install):
funannotate iprscan -i annotate -c 12
Run antiSMASH (optional):
funannotate remote -i annotate -m antismash -e youremail@server.edu
Annotate Genome:
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------
[Oct 21 03:19 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Oct 21 03:19 PM]: Add species parameters to database:
funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json
#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################
#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 12 --species Awesome busco
#########################################################
-------------------------------------------------------
[Oct 21 03:19 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:19 PM]: Running funannotate v1.8.14
[Oct 21 03:19 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:19 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus busco
glimmerhmm busco
snap busco
[Oct 21 03:19 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:19 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:19 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:19 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:20 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:20 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 21 03:24 PM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 21 03:24 PM]: 367 BUSCO predictions validated
[Oct 21 03:24 PM]: Training Augustus using BUSCO gene models
[Oct 21 03:24 PM]: Augustus initial training results:
Feature Specificity Sensitivity
nucleotides 99.4% 83.8%
exons 63.2% 52.6%
genes 76.7% 51.4%
[Oct 21 03:24 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 21 03:25 PM]: 1,284 predictions from Augustus
[Oct 21 03:25 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:25 PM]: Found 306 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:25 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:25 PM]: 1,511 predictions from SNAP
[Oct 21 03:25 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:26 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:26 PM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 978
Augustus HiQ 2 306
GlimmerHMM 1 1777
snap 1 1511
Total - 4572
[Oct 21 03:26 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:28 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:28 PM]: 1,687 total gene models from EVM
[Oct 21 03:28 PM]: Generating protein fasta files from 1,687 EVM models
[Oct 21 03:28 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:28 PM]: Found 139 gene models to remove: 0 too short; 0 span gaps; 139 transposable elements
[Oct 21 03:28 PM]: 1,548 gene models remaining
[Oct 21 03:28 PM]: Predicting tRNAs
[Oct 21 03:28 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:28 PM]: Generating GenBank tbl annotation file
[Oct 21 03:29 PM]: Collecting final annotation files for 1,660 total gene models
[Oct 21 03:29 PM]: Converting to final Genbank format
[Oct 21 03:29 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Oct 21 03:29 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install):
funannotate iprscan -i annotate -c 12
Run antiSMASH (optional):
funannotate remote -i annotate -m antismash -e youremail@server.edu
Annotate Genome:
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------
[Oct 21 03:29 PM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json
[Oct 21 03:29 PM]: Add species parameters to database:
funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json
#########################################################
SUCCESS: `funannotate predict` BUSCO-mediated training test complete.
#########################################################
Now running predict using all pre-trained ab-initio predictors
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 12 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json
#########################################################
-------------------------------------------------------
[Oct 21 03:29 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:29 PM]: Running funannotate v1.8.14
[Oct 21 03:29 PM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json
[Oct 21 03:29 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:29 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
glimmerhmm pretrained
snap pretrained
[Oct 21 03:29 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:29 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:29 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:29 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:29 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:29 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 21 03:29 PM]: 1,284 predictions from Augustus
[Oct 21 03:29 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:29 PM]: Found 306 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:29 PM]: Running SNAP gene prediction, using pre-trained HMM profile
[Oct 21 03:30 PM]: 1,511 predictions from SNAP
[Oct 21 03:30 PM]: Running GlimmerHMM gene prediction, using pretrained HMM profile
[Oct 21 03:30 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:30 PM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 978
Augustus HiQ 2 306
GlimmerHMM 1 1777
snap 1 1511
Total - 4572
[Oct 21 03:30 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:32 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:32 PM]: 1,687 total gene models from EVM
[Oct 21 03:32 PM]: Generating protein fasta files from 1,687 EVM models
[Oct 21 03:32 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:32 PM]: Found 139 gene models to remove: 0 too short; 0 span gaps; 139 transposable elements
[Oct 21 03:32 PM]: 1,548 gene models remaining
[Oct 21 03:32 PM]: Predicting tRNAs
[Oct 21 03:32 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:32 PM]: Generating GenBank tbl annotation file
[Oct 21 03:32 PM]: Collecting final annotation files for 1,660 total gene models
[Oct 21 03:32 PM]: Converting to final Genbank format
[Oct 21 03:33 PM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder
[Oct 21 03:33 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install):
funannotate iprscan -i annotate2 -c 12
Run antiSMASH (optional):
funannotate remote -i annotate2 -m antismash -e youremail@server.edu
Annotate Genome:
funannotate annotate -i annotate2 --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------
[Oct 21 03:33 PM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json
[Oct 21 03:33 PM]: Add species parameters to database:
funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json
#########################################################
SUCCESS: `funannotate predict` using existing parameters test complete.
#########################################################
#########################################################
Running funannotate RNA-seq training/prediction unit testing
Downloading: https://osf.io/t7j83/download?version=1 Bytes: 542753017
CMD: funannotate train -i test.softmasked.fa --single rna-seq.illumina.fastq.gz --nanopore_mrna rna-seq.nanopore.fastq.gz -o rna-seq --cpus 12 --jaccard_clip --species Awesome rna
#########################################################
-------------------------------------------------------
[Oct 21 03:33 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:33 PM]: Running 1.8.14
[Oct 21 03:33 PM]: Adapter and Quality trimming SE reads with Trimmomatic
[Oct 21 03:33 PM]: Running read normalization with Trinity
[Oct 21 03:35 PM]: Processing long reads: converting to fasta and running SeqClean
[Oct 21 03:35 PM]: Building Hisat2 genome index
[Oct 21 03:35 PM]: Aligning reads to genome using Hisat2
[Oct 21 03:36 PM]: Running genome-guided Trinity, logfile: rna-seq/training/Trinity-gg.log
[Oct 21 03:36 PM]: Clustering of reads from BAM and preparing assembly commands
[Oct 21 03:37 PM]: Assembling 1,620 Trinity clusters using 11 CPUs
[Oct 21 03:45 PM]: 1,454 transcripts derived from Trinity
[Oct 21 03:45 PM]: Running StringTie on Hisat2 coordsorted BAM
[Oct 21 03:45 PM]: Removing poly-A sequences from trinity transcripts using seqclean
[Oct 21 03:45 PM]: Aligning long reads to genome with minimap2
[Oct 21 03:45 PM]: Adding 4,736 unique long-reads
[Oct 21 03:45 PM]: Merging BAM files: rna-seq/training/nano_mRNA.coordSorted.bam, rna-seq/training/trinity.alignments.bam
[Oct 21 03:45 PM]: Converting transcript alignments to GFF3 format
[Oct 21 03:45 PM]: Converting Trinity transcript alignments to GFF3 format
[Oct 21 03:45 PM]: Running PASA alignment step using 6,190 transcripts
[Oct 21 03:48 PM]: PASA assigned 863 transcripts to 861 loci (genes)
[Oct 21 03:48 PM]: Getting PASA models for training with TransDecoder
[Oct 21 03:49 PM]: PASA finished. PASAweb accessible via: localhost:port/cgi-bin/index.cgi?db=/home/cris/test-rna_seq_11cab9f7-523e-43cd-b60d-eb0faa16bc13/rna-seq/training/pasa/Awesome_rna_pasa
[Oct 21 03:49 PM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Oct 21 03:49 PM]: Building Kallisto index
[Oct 21 03:49 PM]: Mapping reads using pseudoalignment in Kallisto
[Oct 21 03:49 PM]: Parsing expression value results. Keeping best transcript at each locus.
[Oct 21 03:49 PM]: Wrote 628 PASA gene models
[Oct 21 03:49 PM]: PASA database name: Awesome_rna
[Oct 21 03:49 PM]: Trinity/PASA has completed, you are now ready to run funanotate predict, for example:
funannotate predict -i test.softmasked.fa \
-o rna-seq -s "Awesome rna" --cpus 12
-------------------------------------------------------
#########################################################
Now running `funannotate predict` using RNA-seq training data
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o rna-seq --cpus 12 --min_training_models 150 --species Awesome rna
#########################################################
-------------------------------------------------------
[Oct 21 03:49 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:49 PM]: Running funannotate v1.8.14
[Oct 21 03:49 PM]: Found training files, will re-use these files:
--rna_bam rna-seq/training/funannotate_train.coordSorted.bam
--pasa_gff rna-seq/training/funannotate_train.pasa.gff3
--stringtie rna-seq/training/funannotate_train.stringtie.gtf
--transcript_alignments rna-seq/training/funannotate_train.transcripts.gff3
[Oct 21 03:49 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pasa
codingquarry rna-bam
glimmerhmm pasa
snap pasa
[Oct 21 03:49 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:49 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:49 PM]: Parsed 3,805 transcript alignments from: rna-seq/training/funannotate_train.transcripts.gff3
[Oct 21 03:49 PM]: Creating transcript EVM alignments and Augustus transcripts hintsfile
[Oct 21 03:49 PM]: Extracting hints from RNA-seq BAM file using bam2hints
[Oct 21 03:49 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:49 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:50 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:50 PM]: Filtering PASA data for suitable training set
[Oct 21 03:50 PM]: 592 of 628 models pass training parameters
[Oct 21 03:50 PM]: Training Augustus using PASA gene models
[Oct 21 03:50 PM]: Augustus initial training results:
Feature Specificity Sensitivity
nucleotides 97.4% 86.7%
exons 49.5% 40.2%
genes 48.0% 40.0%
[Oct 21 03:50 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option.
[Oct 21 03:50 PM]: Running Augustus gene prediction using awesome_rna parameters
[Oct 21 03:50 PM]: 1,408 predictions from Augustus
[Oct 21 03:50 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:50 PM]: Found 40 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:50 PM]: Running CodingQuarry prediction using stringtie alignments
[Oct 21 03:52 PM]: 1,659 predictions from CodingQuarry
[Oct 21 03:52 PM]: Running SNAP gene prediction, using training data: rna-seq/predict_misc/final_training_models.gff3
[Oct 21 03:53 PM]: 1,498 predictions from SNAP
[Oct 21 03:53 PM]: Running GlimmerHMM gene prediction, using training data: rna-seq/predict_misc/final_training_models.gff3
[Oct 21 03:53 PM]: 1,804 predictions from GlimmerHMM
[Oct 21 03:53 PM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 1368
Augustus HiQ 2 40
CodingQuarry 2 1659
GlimmerHMM 1 1804
pasa 6 628
snap 1 1498
Total - 6997
[Oct 21 03:53 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:57 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:57 PM]: 1,776 total gene models from EVM
[Oct 21 03:57 PM]: Generating protein fasta files from 1,776 EVM models
[Oct 21 03:57 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:57 PM]: Found 165 gene models to remove: 0 too short; 0 span gaps; 165 transposable elements
[Oct 21 03:57 PM]: 1,611 gene models remaining
[Oct 21 03:57 PM]: Predicting tRNAs
[Oct 21 03:57 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:57 PM]: Generating GenBank tbl annotation file
[Oct 21 03:57 PM]: Collecting final annotation files for 1,723 total gene models
[Oct 21 03:57 PM]: Converting to final Genbank format
[Oct 21 03:57 PM]: Funannotate predict is finished, output files are in the rna-seq/predict_results folder
[Oct 21 03:57 PM]: Your next step to capture UTRs and update annotation using PASA:
funannotate update -i rna-seq --cpus 12
[Oct 21 03:57 PM]: Training parameters file saved: rna-seq/predict_results/awesome_rna.parameters.json
[Oct 21 03:57 PM]: Add species parameters to database:
funannotate species -s awesome_rna -a rna-seq/predict_results/awesome_rna.parameters.json
#########################################################
Now running `funannotate update` to run PASA-mediated UTR addition and multiple transcripts
CMD: funannotate update -i rna-seq --cpus 12
#########################################################
-------------------------------------------------------
[Oct 21 03:57 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:57 PM]: Running 1.8.14
[Oct 21 03:57 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Oct 21 03:57 PM]: Found relevant files in rna-seq/training, will re-use them:
GFF3: rna-seq/predict_results/Awesome_rna.gff3
Genome: rna-seq/predict_results/Awesome_rna.scaffolds.fa
Single reads: rna-seq/training/single.fq.gz
Single Q-trimmed reads: rna-seq/training/trimmomatic/trimmed_single.fastq.gz
Single normalized reads: rna-seq/training/normalize/single.norm.fq
Trinity results: rna-seq/training/funannotate_train.trinity-GG.fasta
Long-read results: rna-seq/training/funannotate_long-reads.fasta
PASA config file: rna-seq/training/pasa/alignAssembly.txt
BAM alignments: rna-seq/training/funannotate_train.coordSorted.bam
StringTie GTF: rna-seq/training/funannotate_train.stringtie.gtf
[Oct 21 03:57 PM]: Reannotating Awesome rna, NCBI accession: None
[Oct 21 03:57 PM]: Previous annotation consists of: 1,611 protein coding gene models and 112 non-coding gene models
[Oct 21 03:57 PM]: Existing annotation: locustag=FUN_ genenumber=1723
[Oct 21 03:57 PM]: Aligning long reads to genome with minimap2
[Oct 21 03:57 PM]: Adding 35 unique long-reads to Trinity assemblies
[Oct 21 03:57 PM]: Merging BAM files: rna-seq/update_misc/nano_mRNA.coordSorted.bam, rna-seq/update_misc/trinity.alignments.bam
[Oct 21 03:57 PM]: Converting transcript alignments to GFF3 format
[Oct 21 03:57 PM]: Converting Trinity transcript alignments to GFF3 format
[Oct 21 03:58 PM]: PASA database is SQLite: /home/cris/test-rna_seq_11cab9f7-523e-43cd-b60d-eb0faa16bc13/rna-seq/training/pasa/Awesome_rna_pasa
[Oct 21 03:58 PM]: Running PASA annotation comparison step 1
[Oct 21 03:58 PM]: Running PASA annotation comparison step 2
[Oct 21 03:59 PM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Oct 21 03:59 PM]: Building Kallisto index
[Oct 21 03:59 PM]: Mapping reads using pseudoalignment in Kallisto
[Oct 21 03:59 PM]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus.
[Oct 21 03:59 PM]: Wrote 1,620 transcripts derived from 1,619 protein coding loci.
[Oct 21 03:59 PM]: Validating gene models (renaming, checking translations, filtering, etc)
[Oct 21 03:59 PM]: Writing 1,728 loci to TBL format: dropped 0 overlapping, 1 too short, and 0 frameshift gene models
[Oct 21 03:59 PM]: Converting to Genbank format
[Oct 21 04:00 PM]: Collecting final annotation files
[Oct 21 04:00 PM]: Comparing original annotation to updated
original: rna-seq/predict_results/Awesome_rna.gff3
updated: rna-seq/update_results/Awesome_rna.gff3
[Oct 21 04:00 PM]: Updated annotation complete:
-------------------------------------------------------
Total Gene Models: 1,728
Total transcripts: 1,730
New Gene Models: 7
No Change: 1,461
Update UTRs: 260
Exons Changed: 0
Exons/CDS Changed: 0
Dropped Models: 0
CDS AED: 0.001
mRNA AED: 0.013
-------------------------------------------------------
[Oct 21 04:00 PM]: Funannotate update is finished, output files are in the rna-seq/update_results folder
[Oct 21 04:00 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i rna-seq -m docker -c 12
Run antiSMASH:
funannotate remote -i rna-seq -m antismash -e youremail@server.edu
Annotate Genome:
funannotate annotate -i rna-seq --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------
#########################################################
SUCCESS: funannotate RNA-seq training/prediction test complete.
#########################################################
#########################################################
#########################################################
Running `funannotate annotate` unit testing
Downloading: https://osf.io/97pyn/download?version=1 Bytes: 341476
CMD: funannotate annotate --genbank Genome_one.gbk -o annotate --cpus 12 --iprscan genome_one.iprscan.xml --eggnog genome_one.emapper.annotations
#########################################################
-------------------------------------------------------
[Oct 21 04:00 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 04:00 PM]: Running 1.8.14
[Oct 21 04:00 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Oct 21 04:00 PM]: Checking GenBank file for annotation
[Oct 21 04:00 PM]: Adding Functional Annotation to Genome one, NCBI accession: None
[Oct 21 04:00 PM]: Annotation consists of: 125 gene models
[Oct 21 04:00 PM]: 124 protein records loaded
[Oct 21 04:00 PM]: Running HMMer search of PFAM version 34.0
[Oct 21 04:00 PM]: 90 annotations added
[Oct 21 04:00 PM]: Running Diamond blastp search of UniProt DB version 2021_02
[Oct 21 04:00 PM]: 12 valid gene/product annotations from 14 total
[Oct 21 04:00 PM]: Existing Eggnog-mapper results found: annotate/annotate_misc/eggnog.emapper.annotations
[Oct 21 04:00 PM]: Parsing EggNog Annotations
[Oct 21 04:00 PM]: EggNog version parsed as 1.0.3
[Oct 21 04:00 PM]: 132 COG and EggNog annotations added
[Oct 21 04:00 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.69
[Oct 21 04:00 PM]: 21 gene name and product description annotations added
[Oct 21 04:00 PM]: Running Diamond blastp search of MEROPS version 12.0
[Oct 21 04:00 PM]: 0 annotations added
[Oct 21 04:00 PM]: Annotating CAZYmes using HMMer search of dbCAN version 9.0
[Oct 21 04:00 PM]: 2 annotations added
[Oct 21 04:00 PM]: Annotating proteins with BUSCO dikarya models
[Oct 21 04:01 PM]: 6 annotations added
[Oct 21 04:01 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[Oct 21 04:01 PM]: Skipping secretome: neither SignalP nor Phobius searches were run
[Oct 21 04:01 PM]: 0 secretome and 0 transmembane annotations added
[Oct 21 04:01 PM]: Parsing InterProScan5 XML file
[Oct 21 04:01 PM]: Found 0 duplicated annotations, adding 628 valid annotations
[Oct 21 04:01 PM]: Converting to final Genbank format, good luck!
[Oct 21 04:01 PM]: Creating AGP file and corresponding contigs file
[Oct 21 04:01 PM]: Writing genome annotation table.
[Oct 21 04:01 PM]: Funannotate annotate has completed successfully!
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate annotate` test complete.
#########################################################
#########################################################
Running `funannotate compare` unit testing
Downloading: https://osf.io/7s9xh/download?version=1 Bytes: 1020999
CMD: funannotate compare -i Genome_one.gbk Genome_two.gbk Genome_three.gbk -o compare --cpus 12 --ml_model LG+G4 --outgroup botrytis_cinerea.dikarya
#########################################################
-------------------------------------------------------
[Oct 21 04:01 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 04:01 PM]: Running 1.8.14
[Oct 21 04:01 PM]: Now parsing 3 genomes
[Oct 21 04:01 PM]: working on Genome one
[Oct 21 04:01 PM]: working on Genome two
[Oct 21 04:01 PM]: working on Genome three
[Oct 21 04:01 PM]: No secondary metabolite annotations found
[Oct 21 04:01 PM]: Summarizing PFAM domain results
[Oct 21 04:01 PM]: Summarizing InterProScan results
[Oct 21 04:01 PM]: Loading InterPro descriptions
[Oct 21 04:01 PM]: Summarizing MEROPS protease results
[Oct 21 04:01 PM]: found 4 MEROPS familes
[Oct 21 04:01 PM]: Summarizing CAZyme results
[Oct 21 04:01 PM]: found 5 CAZy familes
[Oct 21 04:01 PM]: Summarizing COG results
[Oct 21 04:01 PM]: Summarizing secreted protein results
[Oct 21 04:01 PM]: Summarizing fungal transcription factors
[Oct 21 04:01 PM]: Running GO enrichment for each genome
WARNING: skipping Genome_one.txt as no GO terms
[Oct 21 04:03 PM]: Running orthologous clustering tool, ProteinOrtho. This may take awhile...
[Oct 21 04:03 PM]: Compiling all annotations for each genome
[Oct 21 04:03 PM]: Inferring phylogeny using iqtree
[Oct 21 04:03 PM]: Found 1 single copy BUSCO orthologs, will use all to infer phylogeny
[Oct 21 04:03 PM]: Compressing results to output file: compare.tar.gz
[Oct 21 04:03 PM]: Funannotate compare completed successfully!
#########################################################
SUCCESS: `funannotate compare` test complete.
#########################################################
Hi, I have freshly installed funannotate through conda, then removed augustus by using "conda remove --force-remove augustus". After that I locally installed augustus 3.3 through apt-get and redirected the augustus config path "export AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config". Any ideas what could be going on?
(funannotate) cris@cris-biosciences:~$ funannotate test -t busco --cpus 16 ######################################################### Running
funannotate predict` BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 16 --species Awesome busco #########################################################[Oct 19 02:50 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13 [Oct 19 02:50 PM]: Running funannotate v1.8.13 [Oct 19 02:50 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 19 02:50 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 19 02:50 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
sys.exit(main())
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
mod.main(arguments)
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 1415, in main
lib.trainAugustus(AUGUSTUS_BASE, aug_species, trainingset,
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 8593, in trainAugustus
train_results = getTrainResults(os.path.join(
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 8399, in getTrainResults
return (float(values1[1]), float(values1[2]), float(values2[6]), float(values2[7]), float(values3[6]), float(values3[7]))
UnboundLocalError: local variable 'values1' referenced before assignment
#########################################################
Traceback (most recent call last):
File "/scratch/anaconda3/envs/funannotate/bin/funannotate", line 10, in
sys.exit(main())
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
mod.main(arguments)
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 407, in main
runBuscoTest(args)
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 200, in runBuscoTest
assert 1500 <= countGFFgenes(os.path.join(
File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_22831388-2b47-4b82-a972-56cb4224b6d1/annotate/predict_results/Awesome_busco.gff3'
`
glimmerhmm busco
snap busco
[Oct 19 02:50 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 19 02:50 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Oct 19 02:50 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [Oct 19 02:50 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [Oct 19 02:50 PM]: Exonerate finished in 0:00:10: found 1,270 alignments [Oct 19 02:50 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 19 02:54 PM]: 268 valid BUSCO predictions found, validating protein sequences [Oct 19 02:55 PM]: 268 BUSCO predictions validated [Oct 19 02:55 PM]: Training Augustus using BUSCO gene models Traceback (most recent call last): File "/scratch/anaconda3/envs/funannotate/bin/funannotate", line 10, in