nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

UnboundLocalError: local variable 'values1' referenced before assignment #818

Closed caonetto closed 1 year ago

caonetto commented 1 year ago

Hi, I have freshly installed funannotate through conda, then removed augustus by using "conda remove --force-remove augustus". After that I locally installed augustus 3.3 through apt-get and redirected the augustus config path "export AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config". Any ideas what could be going on?

(funannotate) cris@cris-biosciences:~$ funannotate test -t busco --cpus 16 ######################################################### Runningfunannotate predict` BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 16 --species Awesome busco #########################################################

[Oct 19 02:50 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13 [Oct 19 02:50 PM]: Running funannotate v1.8.13 [Oct 19 02:50 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 19 02:50 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 19 02:50 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
glimmerhmm busco
snap busco
[Oct 19 02:50 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 19 02:50 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Oct 19 02:50 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [Oct 19 02:50 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [Oct 19 02:50 PM]: Exonerate finished in 0:00:10: found 1,270 alignments [Oct 19 02:50 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 19 02:54 PM]: 268 valid BUSCO predictions found, validating protein sequences [Oct 19 02:55 PM]: 268 BUSCO predictions validated [Oct 19 02:55 PM]: Training Augustus using BUSCO gene models Traceback (most recent call last): File "/scratch/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 1415, in main lib.trainAugustus(AUGUSTUS_BASE, aug_species, trainingset, File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 8593, in trainAugustus train_results = getTrainResults(os.path.join( File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 8399, in getTrainResults return (float(values1[1]), float(values1[2]), float(values2[6]), float(values2[7]), float(values3[6]), float(values3[7])) UnboundLocalError: local variable 'values1' referenced before assignment ######################################################### Traceback (most recent call last): File "/scratch/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 407, in main runBuscoTest(args) File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 200, in runBuscoTest assert 1500 <= countGFFgenes(os.path.join( File "/scratch/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_22831388-2b47-4b82-a972-56cb4224b6d1/annotate/predict_results/Awesome_busco.gff3' `

hyphaltip commented 1 year ago

This seems like augutus is failing in training step - can you check the setup and report what augustus version is installed?

funannotate check --show-versions
nextgenusfs commented 1 year ago

The file its trying to parse is predict_misc/augustus.initial.training.txt, which appears to be corrupt or empty perhaps?

caonetto commented 1 year ago

Thanks for your quick response. I managed to fix the issue by doing a fresh conda install of funannotate, removed the included augusuts, then installed augustus 3.5 from conda and updated the funannotate scripts using git.

nextgenusfs commented 1 year ago

Great. Can you confirm that with this conda setup that all of the tests from funannotate test pass? I'm still trying to get a version of augustus v3.5 working on my Mac (failing so far), so I'm not sure if everything is working in linux (I don't want to update the docker image until I know its safe to do so).

caonetto commented 1 year ago

Hi, Just run funannotate test and it all seems to have completed succesfully.

Cheers.

(funannotate) cris@cris-biosciences:~$ funannotate test -t all --cpus 12
#########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
minimap2 version=2.24-r1122 path=/scratch/anaconda3/envs/funannotate/bin/minimap2
-----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################

#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 12
#########################################################
-------------------------------------------------------
[Oct 21 03:09 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:09 PM]: Running funanotate v1.8.14
[Oct 21 03:09 PM]: Soft-masking simple repeats with tantan
[Oct 21 03:09 PM]: Repeat soft-masking finished: 
Masked genome: /home/cris/test-mask_11cab9f7-523e-43cd-b60d-eb0faa16bc13/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################

#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 12 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Oct 21 03:09 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:09 PM]: Running funannotate v1.8.14
[Oct 21 03:09 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:09 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Oct 21 03:09 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:09 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:09 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:09 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:09 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:09 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 21 03:13 PM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 21 03:14 PM]: 367 BUSCO predictions validated
[Oct 21 03:14 PM]: Running Augustus gene prediction using saccharomyces parameters
[Oct 21 03:15 PM]: 1,485 predictions from Augustus
[Oct 21 03:15 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:15 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:15 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:15 PM]: 1,532 predictions from SNAP
[Oct 21 03:15 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:16 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:16 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GlimmerHMM     1        1777 
  snap           1        1532 
  Total          -        5006 
[Oct 21 03:16 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:19 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:19 PM]: 1,699 total gene models from EVM
[Oct 21 03:19 PM]: Generating protein fasta files from 1,699 EVM models
[Oct 21 03:19 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:19 PM]: Found 135 gene models to remove: 0 too short; 0 span gaps; 135 transposable elements
[Oct 21 03:19 PM]: 1,564 gene models remaining
[Oct 21 03:19 PM]: Predicting tRNAs
[Oct 21 03:19 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:19 PM]: Generating GenBank tbl annotation file
[Oct 21 03:19 PM]: Collecting final annotation files for 1,676 total gene models
[Oct 21 03:19 PM]: Converting to final Genbank format
[Oct 21 03:19 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Oct 21 03:19 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Oct 21 03:19 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Oct 21 03:19 PM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 12 --species Awesome busco
#########################################################
-------------------------------------------------------
[Oct 21 03:19 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:19 PM]: Running funannotate v1.8.14
[Oct 21 03:19 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:19 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco          
  glimmerhmm   busco          
  snap         busco          
[Oct 21 03:19 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:19 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:19 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:19 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:20 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:20 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 21 03:24 PM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 21 03:24 PM]: 367 BUSCO predictions validated
[Oct 21 03:24 PM]: Training Augustus using BUSCO gene models
[Oct 21 03:24 PM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   99.4%         83.8%      
  exons         63.2%         52.6%      
  genes         76.7%         51.4%      
[Oct 21 03:24 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 21 03:25 PM]: 1,284 predictions from Augustus
[Oct 21 03:25 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:25 PM]: Found 306 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:25 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:25 PM]: 1,511 predictions from SNAP
[Oct 21 03:25 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 21 03:26 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:26 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        978  
  Augustus HiQ   2        306  
  GlimmerHMM     1        1777 
  snap           1        1511 
  Total          -        4572 
[Oct 21 03:26 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:28 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:28 PM]: 1,687 total gene models from EVM
[Oct 21 03:28 PM]: Generating protein fasta files from 1,687 EVM models
[Oct 21 03:28 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:28 PM]: Found 139 gene models to remove: 0 too short; 0 span gaps; 139 transposable elements
[Oct 21 03:28 PM]: 1,548 gene models remaining
[Oct 21 03:28 PM]: Predicting tRNAs
[Oct 21 03:28 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:28 PM]: Generating GenBank tbl annotation file
[Oct 21 03:29 PM]: Collecting final annotation files for 1,660 total gene models
[Oct 21 03:29 PM]: Converting to final Genbank format
[Oct 21 03:29 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Oct 21 03:29 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Oct 21 03:29 PM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json
[Oct 21 03:29 PM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json

#########################################################
SUCCESS: `funannotate predict` BUSCO-mediated training test complete.
#########################################################
Now running predict using all pre-trained ab-initio predictors
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 12 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json
#########################################################
-------------------------------------------------------
[Oct 21 03:29 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:29 PM]: Running funannotate v1.8.14
[Oct 21 03:29 PM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json
[Oct 21 03:29 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 21 03:29 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   pretrained     
  snap         pretrained     
[Oct 21 03:29 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:29 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:29 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:29 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:29 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:29 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 21 03:29 PM]: 1,284 predictions from Augustus
[Oct 21 03:29 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:29 PM]: Found 306 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:29 PM]: Running SNAP gene prediction, using pre-trained HMM profile
[Oct 21 03:30 PM]: 1,511 predictions from SNAP
[Oct 21 03:30 PM]: Running GlimmerHMM gene prediction, using pretrained HMM profile
[Oct 21 03:30 PM]: 1,777 predictions from GlimmerHMM
[Oct 21 03:30 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        978  
  Augustus HiQ   2        306  
  GlimmerHMM     1        1777 
  snap           1        1511 
  Total          -        4572 
[Oct 21 03:30 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:32 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:32 PM]: 1,687 total gene models from EVM
[Oct 21 03:32 PM]: Generating protein fasta files from 1,687 EVM models
[Oct 21 03:32 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:32 PM]: Found 139 gene models to remove: 0 too short; 0 span gaps; 139 transposable elements
[Oct 21 03:32 PM]: 1,548 gene models remaining
[Oct 21 03:32 PM]: Predicting tRNAs
[Oct 21 03:32 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:32 PM]: Generating GenBank tbl annotation file
[Oct 21 03:32 PM]: Collecting final annotation files for 1,660 total gene models
[Oct 21 03:32 PM]: Converting to final Genbank format
[Oct 21 03:33 PM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder
[Oct 21 03:33 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate2 -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate2 -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate2 --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Oct 21 03:33 PM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json
[Oct 21 03:33 PM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json

#########################################################
SUCCESS: `funannotate predict` using existing parameters test complete.
#########################################################

#########################################################
Running funannotate RNA-seq training/prediction unit testing
Downloading: https://osf.io/t7j83/download?version=1 Bytes: 542753017
CMD: funannotate train -i test.softmasked.fa --single rna-seq.illumina.fastq.gz --nanopore_mrna rna-seq.nanopore.fastq.gz -o rna-seq --cpus 12 --jaccard_clip --species Awesome rna
#########################################################
-------------------------------------------------------
[Oct 21 03:33 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:33 PM]: Running 1.8.14
[Oct 21 03:33 PM]: Adapter and Quality trimming SE reads with Trimmomatic
[Oct 21 03:33 PM]: Running read normalization with Trinity
[Oct 21 03:35 PM]: Processing long reads: converting to fasta and running SeqClean
[Oct 21 03:35 PM]: Building Hisat2 genome index
[Oct 21 03:35 PM]: Aligning reads to genome using Hisat2
[Oct 21 03:36 PM]: Running genome-guided Trinity, logfile: rna-seq/training/Trinity-gg.log
[Oct 21 03:36 PM]: Clustering of reads from BAM and preparing assembly commands
[Oct 21 03:37 PM]: Assembling 1,620 Trinity clusters using 11 CPUs
[Oct 21 03:45 PM]: 1,454 transcripts derived from Trinity
[Oct 21 03:45 PM]: Running StringTie on Hisat2 coordsorted BAM
[Oct 21 03:45 PM]: Removing poly-A sequences from trinity transcripts using seqclean
[Oct 21 03:45 PM]: Aligning long reads to genome with minimap2
[Oct 21 03:45 PM]: Adding 4,736 unique long-reads
[Oct 21 03:45 PM]: Merging BAM files: rna-seq/training/nano_mRNA.coordSorted.bam, rna-seq/training/trinity.alignments.bam
[Oct 21 03:45 PM]: Converting transcript alignments to GFF3 format
[Oct 21 03:45 PM]: Converting Trinity transcript alignments to GFF3 format
[Oct 21 03:45 PM]: Running PASA alignment step using 6,190 transcripts
[Oct 21 03:48 PM]: PASA assigned 863 transcripts to 861 loci (genes)
[Oct 21 03:48 PM]: Getting PASA models for training with TransDecoder
[Oct 21 03:49 PM]: PASA finished. PASAweb accessible via: localhost:port/cgi-bin/index.cgi?db=/home/cris/test-rna_seq_11cab9f7-523e-43cd-b60d-eb0faa16bc13/rna-seq/training/pasa/Awesome_rna_pasa
[Oct 21 03:49 PM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Oct 21 03:49 PM]: Building Kallisto index
[Oct 21 03:49 PM]: Mapping reads using pseudoalignment in Kallisto
[Oct 21 03:49 PM]: Parsing expression value results. Keeping best transcript at each locus.
[Oct 21 03:49 PM]: Wrote 628 PASA gene models
[Oct 21 03:49 PM]: PASA database name: Awesome_rna
[Oct 21 03:49 PM]: Trinity/PASA has completed, you are now ready to run funanotate predict, for example:

  funannotate predict -i test.softmasked.fa \
            -o rna-seq -s "Awesome rna" --cpus 12

-------------------------------------------------------
#########################################################
Now running `funannotate predict` using RNA-seq training data
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o rna-seq --cpus 12 --min_training_models 150 --species Awesome rna
#########################################################
-------------------------------------------------------
[Oct 21 03:49 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:49 PM]: Running funannotate v1.8.14
[Oct 21 03:49 PM]: Found training files, will re-use these files:
  --rna_bam rna-seq/training/funannotate_train.coordSorted.bam
  --pasa_gff rna-seq/training/funannotate_train.pasa.gff3
  --stringtie rna-seq/training/funannotate_train.stringtie.gtf
  --transcript_alignments rna-seq/training/funannotate_train.transcripts.gff3
[Oct 21 03:49 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program        Training-Method
  augustus       pasa           
  codingquarry   rna-bam        
  glimmerhmm     pasa           
  snap           pasa           
[Oct 21 03:49 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 21 03:49 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 21 03:49 PM]: Parsed 3,805 transcript alignments from: rna-seq/training/funannotate_train.transcripts.gff3
[Oct 21 03:49 PM]: Creating transcript EVM alignments and Augustus transcripts hintsfile
[Oct 21 03:49 PM]: Extracting hints from RNA-seq BAM file using bam2hints
[Oct 21 03:49 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 21 03:49 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 21 03:50 PM]: Exonerate finished in 0:00:12: found 1,270 alignments
[Oct 21 03:50 PM]: Filtering PASA data for suitable training set
[Oct 21 03:50 PM]: 592 of 628 models pass training parameters
[Oct 21 03:50 PM]: Training Augustus using PASA gene models
[Oct 21 03:50 PM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   97.4%         86.7%      
  exons         49.5%         40.2%      
  genes         48.0%         40.0%      
[Oct 21 03:50 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option.
[Oct 21 03:50 PM]: Running Augustus gene prediction using awesome_rna parameters
[Oct 21 03:50 PM]: 1,408 predictions from Augustus
[Oct 21 03:50 PM]: Pulling out high quality Augustus predictions
[Oct 21 03:50 PM]: Found 40 high quality predictions from Augustus (>90% exon evidence)
[Oct 21 03:50 PM]: Running CodingQuarry prediction using stringtie alignments
[Oct 21 03:52 PM]: 1,659 predictions from CodingQuarry
[Oct 21 03:52 PM]: Running SNAP gene prediction, using training data: rna-seq/predict_misc/final_training_models.gff3
[Oct 21 03:53 PM]: 1,498 predictions from SNAP
[Oct 21 03:53 PM]: Running GlimmerHMM gene prediction, using training data: rna-seq/predict_misc/final_training_models.gff3
[Oct 21 03:53 PM]: 1,804 predictions from GlimmerHMM
[Oct 21 03:53 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1368 
  Augustus HiQ   2        40   
  CodingQuarry   2        1659 
  GlimmerHMM     1        1804 
  pasa           6        628  
  snap           1        1498 
  Total          -        6997 
[Oct 21 03:53 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 21 03:57 PM]: Converting to GFF3 and collecting all EVM results
[Oct 21 03:57 PM]: 1,776 total gene models from EVM
[Oct 21 03:57 PM]: Generating protein fasta files from 1,776 EVM models
[Oct 21 03:57 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 21 03:57 PM]: Found 165 gene models to remove: 0 too short; 0 span gaps; 165 transposable elements
[Oct 21 03:57 PM]: 1,611 gene models remaining
[Oct 21 03:57 PM]: Predicting tRNAs
[Oct 21 03:57 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 21 03:57 PM]: Generating GenBank tbl annotation file
[Oct 21 03:57 PM]: Collecting final annotation files for 1,723 total gene models
[Oct 21 03:57 PM]: Converting to final Genbank format
[Oct 21 03:57 PM]: Funannotate predict is finished, output files are in the rna-seq/predict_results folder
[Oct 21 03:57 PM]: Your next step to capture UTRs and update annotation using PASA:

  funannotate update -i rna-seq --cpus 12

[Oct 21 03:57 PM]: Training parameters file saved: rna-seq/predict_results/awesome_rna.parameters.json
[Oct 21 03:57 PM]: Add species parameters to database:

  funannotate species -s awesome_rna -a rna-seq/predict_results/awesome_rna.parameters.json

#########################################################
Now running `funannotate update` to run PASA-mediated UTR addition and multiple transcripts
CMD: funannotate update -i rna-seq --cpus 12
#########################################################
-------------------------------------------------------
[Oct 21 03:57 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 03:57 PM]: Running 1.8.14
[Oct 21 03:57 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Oct 21 03:57 PM]: Found relevant files in rna-seq/training, will re-use them:
    GFF3: rna-seq/predict_results/Awesome_rna.gff3
    Genome: rna-seq/predict_results/Awesome_rna.scaffolds.fa
    Single reads: rna-seq/training/single.fq.gz
    Single Q-trimmed reads: rna-seq/training/trimmomatic/trimmed_single.fastq.gz
    Single normalized reads: rna-seq/training/normalize/single.norm.fq
    Trinity results: rna-seq/training/funannotate_train.trinity-GG.fasta
    Long-read results: rna-seq/training/funannotate_long-reads.fasta
    PASA config file: rna-seq/training/pasa/alignAssembly.txt
    BAM alignments: rna-seq/training/funannotate_train.coordSorted.bam
    StringTie GTF: rna-seq/training/funannotate_train.stringtie.gtf
[Oct 21 03:57 PM]: Reannotating Awesome rna, NCBI accession: None
[Oct 21 03:57 PM]: Previous annotation consists of: 1,611 protein coding gene models and 112 non-coding gene models
[Oct 21 03:57 PM]: Existing annotation: locustag=FUN_ genenumber=1723
[Oct 21 03:57 PM]: Aligning long reads to genome with minimap2
[Oct 21 03:57 PM]: Adding 35 unique long-reads to Trinity assemblies
[Oct 21 03:57 PM]: Merging BAM files: rna-seq/update_misc/nano_mRNA.coordSorted.bam, rna-seq/update_misc/trinity.alignments.bam
[Oct 21 03:57 PM]: Converting transcript alignments to GFF3 format
[Oct 21 03:57 PM]: Converting Trinity transcript alignments to GFF3 format
[Oct 21 03:58 PM]: PASA database is SQLite: /home/cris/test-rna_seq_11cab9f7-523e-43cd-b60d-eb0faa16bc13/rna-seq/training/pasa/Awesome_rna_pasa
[Oct 21 03:58 PM]: Running PASA annotation comparison step 1
[Oct 21 03:58 PM]: Running PASA annotation comparison step 2
[Oct 21 03:59 PM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Oct 21 03:59 PM]: Building Kallisto index
[Oct 21 03:59 PM]: Mapping reads using pseudoalignment in Kallisto
[Oct 21 03:59 PM]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus.
[Oct 21 03:59 PM]: Wrote 1,620 transcripts derived from 1,619 protein coding loci.
[Oct 21 03:59 PM]: Validating gene models (renaming, checking translations, filtering, etc)
[Oct 21 03:59 PM]: Writing 1,728 loci to TBL format: dropped 0 overlapping, 1 too short, and 0 frameshift gene models
[Oct 21 03:59 PM]: Converting to Genbank format
[Oct 21 04:00 PM]: Collecting final annotation files
[Oct 21 04:00 PM]: Comparing original annotation to updated
 original: rna-seq/predict_results/Awesome_rna.gff3
 updated: rna-seq/update_results/Awesome_rna.gff3
[Oct 21 04:00 PM]: Updated annotation complete:
-------------------------------------------------------
Total Gene Models:  1,728
Total transcripts:  1,730
New Gene Models:    7
No Change:      1,461
Update UTRs:        260
Exons Changed:      0
Exons/CDS Changed:  0
Dropped Models:     0
CDS AED:        0.001
mRNA AED:       0.013
-------------------------------------------------------
[Oct 21 04:00 PM]: Funannotate update is finished, output files are in the rna-seq/update_results folder
[Oct 21 04:00 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required): 
funannotate iprscan -i rna-seq -m docker -c 12

Run antiSMASH: 
funannotate remote -i rna-seq -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i rna-seq --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

#########################################################
SUCCESS: funannotate RNA-seq training/prediction test complete.
#########################################################

#########################################################

#########################################################
Running `funannotate annotate` unit testing
Downloading: https://osf.io/97pyn/download?version=1 Bytes: 341476
CMD: funannotate annotate --genbank Genome_one.gbk -o annotate --cpus 12 --iprscan genome_one.iprscan.xml --eggnog genome_one.emapper.annotations
#########################################################
-------------------------------------------------------
[Oct 21 04:00 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 04:00 PM]: Running 1.8.14
[Oct 21 04:00 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Oct 21 04:00 PM]: Checking GenBank file for annotation
[Oct 21 04:00 PM]: Adding Functional Annotation to Genome one, NCBI accession: None
[Oct 21 04:00 PM]: Annotation consists of: 125 gene models
[Oct 21 04:00 PM]: 124 protein records loaded
[Oct 21 04:00 PM]: Running HMMer search of PFAM version 34.0
[Oct 21 04:00 PM]: 90 annotations added
[Oct 21 04:00 PM]: Running Diamond blastp search of UniProt DB version 2021_02
[Oct 21 04:00 PM]: 12 valid gene/product annotations from 14 total
[Oct 21 04:00 PM]: Existing Eggnog-mapper results found: annotate/annotate_misc/eggnog.emapper.annotations
[Oct 21 04:00 PM]: Parsing EggNog Annotations
[Oct 21 04:00 PM]: EggNog version parsed as 1.0.3
[Oct 21 04:00 PM]: 132 COG and EggNog annotations added
[Oct 21 04:00 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.69
[Oct 21 04:00 PM]: 21 gene name and product description annotations added
[Oct 21 04:00 PM]: Running Diamond blastp search of MEROPS version 12.0
[Oct 21 04:00 PM]: 0 annotations added
[Oct 21 04:00 PM]: Annotating CAZYmes using HMMer search of dbCAN version 9.0
[Oct 21 04:00 PM]: 2 annotations added
[Oct 21 04:00 PM]: Annotating proteins with BUSCO dikarya models
[Oct 21 04:01 PM]: 6 annotations added
[Oct 21 04:01 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[Oct 21 04:01 PM]: Skipping secretome: neither SignalP nor Phobius searches were run
[Oct 21 04:01 PM]: 0 secretome and 0 transmembane annotations added
[Oct 21 04:01 PM]: Parsing InterProScan5 XML file
[Oct 21 04:01 PM]: Found 0 duplicated annotations, adding 628 valid annotations
[Oct 21 04:01 PM]: Converting to final Genbank format, good luck!
[Oct 21 04:01 PM]: Creating AGP file and corresponding contigs file
[Oct 21 04:01 PM]: Writing genome annotation table.
[Oct 21 04:01 PM]: Funannotate annotate has completed successfully!
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate annotate` test complete.
#########################################################

#########################################################
Running `funannotate compare` unit testing
Downloading: https://osf.io/7s9xh/download?version=1 Bytes: 1020999
CMD: funannotate compare -i Genome_one.gbk Genome_two.gbk Genome_three.gbk -o compare --cpus 12 --ml_model LG+G4 --outgroup botrytis_cinerea.dikarya
#########################################################
-------------------------------------------------------
[Oct 21 04:01 PM]: OS: Ubuntu 18.04, 16 cores, ~ 66 GB RAM. Python: 3.8.13
[Oct 21 04:01 PM]: Running 1.8.14
[Oct 21 04:01 PM]: Now parsing 3 genomes
[Oct 21 04:01 PM]: working on Genome one
[Oct 21 04:01 PM]: working on Genome two
[Oct 21 04:01 PM]: working on Genome three
[Oct 21 04:01 PM]: No secondary metabolite annotations found
[Oct 21 04:01 PM]: Summarizing PFAM domain results
[Oct 21 04:01 PM]: Summarizing InterProScan results
[Oct 21 04:01 PM]: Loading InterPro descriptions
[Oct 21 04:01 PM]: Summarizing MEROPS protease results
[Oct 21 04:01 PM]: found 4 MEROPS familes
[Oct 21 04:01 PM]: Summarizing CAZyme results
[Oct 21 04:01 PM]: found 5 CAZy familes
[Oct 21 04:01 PM]: Summarizing COG results
[Oct 21 04:01 PM]: Summarizing secreted protein results
[Oct 21 04:01 PM]: Summarizing fungal transcription factors
[Oct 21 04:01 PM]: Running GO enrichment for each genome
  WARNING: skipping Genome_one.txt as no GO terms
[Oct 21 04:03 PM]: Running orthologous clustering tool, ProteinOrtho.  This may take awhile...
[Oct 21 04:03 PM]: Compiling all annotations for each genome
[Oct 21 04:03 PM]: Inferring phylogeny using iqtree
[Oct 21 04:03 PM]: Found 1 single copy BUSCO orthologs, will use all to infer phylogeny
[Oct 21 04:03 PM]: Compressing results to output file: compare.tar.gz
[Oct 21 04:03 PM]: Funannotate compare completed successfully!
#########################################################
SUCCESS: `funannotate compare` test complete.
#########################################################