nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

funannotate predict works on test data but not on my real data #1045

Closed Faheema-khan closed 3 weeks ago

Faheema-khan commented 3 weeks ago

Are you using the latest release? Yes If you are not using the latest release of funannotate, please upgrade, if bug persists then report here.

Describe the bug A clear and concise description of what the bug is.

Funanotate predict works perfectly fine on the test data i.e when I run the command funannotate test -t predict but gives me this error, when run on my own fungal data [May 31 01:01 PM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.8.15 [May 31 01:01 PM]: Running funannotate v1.8.17 Traceback (most recent call last): File "/proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 717, in main mod.main(arguments) File "/proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 378, in main BAM2HINTS = os.path.join(AUGUSTUS_BASE, "bin", "bam2hints") UnboundLocalError: local variable 'AUGUSTUS_BASE' referenced before assignment

What command did you issue? funannotate predict -i "$out".cleaned.sorted.masked.fa -o fun \ --species "Tricholoma terreum" \ --busco_seed_species laccaria_bicolor --cpus 12 \ --buscodb basidiomycota --name TRITER \ --transcript_evidence $annotation_dir/Trima3_EST_20130329_cluster_consensi.fasta \ $FUNANNOTATE_DB/uniprot_sprot.fasta

Logfiles Please provide relavent log files of the error. [05/30/24 17:40:22]: /proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/bin/funannotate predict -i 3.cleaned.sorted.masked.fa -o fun --species Leccinum scabrum --busco_seed_species laccaria_bicolor --cpus 12 --buscodb basidiomycota --name LECSCA --transcript_evidence /proj/snic2022-23-423/nobackup/private/Faheema/annotation/project3/Boled5_EST_20170707_cluster_consensi.fasta /proj/snic2022-23-423/nobackup/private/Faheema/annotation/project3/Xerba1_EST_20150320_cluster_consensi.fasta /home/faheema/funannotate_db/uniprot_sprot.fasta

[05/30/24 17:40:22]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.8.15 [05/30/24 17:40:22]: Running funannotate v1.8.17 [05/30/24 17:40:22]: GeneMark path: /sw/apps/bioinfo/GeneMark/4.57-es/rackham [05/30/24 17:40:22]: Full path to gmes_petap.pl: /sw/apps/bioinfo/GeneMark/4.57-es/rackham/gmes_petap.pl [05/30/24 17:40:22]: GeneMark appears to be functional? True 05/30/24 17:40:23: exonerate version=exonerate 2.4.0 path=/sw/bioinfo/exonerate/2.4.0/rackham/bin/exonerate 05/30/24 17:40:23: diamond version=2.0.4 path=/sw/bioinfo/diamond/2.0.4/rackham/bin/diamond 05/30/24 17:40:23: tbl2asn version=25.3 path=/sw/bioinfo/tbl2asn/25.3/rackham/tbl2asn 05/30/24 17:40:23: bedtools version=bedtools v2.29.2 path=/sw/bioinfo/BEDTools/2.29.2/rackham/bin/bedtools 05/30/24 17:40:23: augustus version=3.4.0 path=/sw/bioinfo/augustus/3.4.0/rackham/bin/augustus 05/30/24 17:40:23: etraining version=NA path=/sw/bioinfo/augustus/3.4.0/rackham/bin/etraining 05/30/24 17:40:23: tRNAscan-SE version=1.3.1 (January 2012) path=/sw/bioinfo/tRNAscan-SE/1.3.1/rackham/bin/tRNAscan-SE 05/30/24 17:40:23: bam2hints version=NA path=/sw/bioinfo/augustus/3.4.0/rackham/bin/bam2hints 05/30/24 17:40:23: minimap2 version=2.16-r922 path=/sw/bioinfo/minimap2/2.16/rackham/bin/minimap2

[05/30/24 17:40:25]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1} [05/30/24 17:40:25]: Skipping CodingQuarry as no --rna_bam passed [05/30/24 17:40:25]: {'augustus': 'busco', 'genemark': 'selftraining', 'snap': 'busco', 'glimmerhmm': 'busco'} [05/30/24 17:40:25]: Parsed training data, run ab-initio gene predictors as follows: [05/30/24 17:40:25]: augustus --species=anidulans --proteinprofile=/proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/lib/python3.8/site-packages/funannotate/config/EOG092C0B3U.prfl /proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/lib/python3.8/site-packages/funannotate/config/busco_test.fa [05/30/24 17:40:27]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1} [05/30/24 17:40:45]: Loading genome assembly and parsing soft-masked repetitive sequences [05/30/24 17:42:06]: Genome loaded: 12,522 scaffolds; 49,283,185 bp; 20.76% repeats masked [05/30/24 17:42:10]: Aligning transcript evidence to genome with minimap2 [05/30/24 17:42:10]: minimap2 -ax splice -t 12 --cs -u b -G 3000 /crex/proj/uppstore2018147/nobackup/private/Faheema/annotation/project3/Sample_UA-2819-3/fun/predict_misc/genome.softmasked.fa fun/predict_misc/transcripts.combined.fa | samtools sort --reference /crex/proj/uppstore2018147/nobackup/private/Faheema/annotation/project3/Sample_UA-2819-3/fun/predict_misc/genome.softmasked.fa -@ 4 -o fun/predict_misc/transcripts.minimap2.bam - [05/30/24 17:42:57]: Found 13,904 alignments, wrote GFF3 and Augustus hints to file

[05/30/24 18:10:08]: Running GeneMark-ES on assembly [05/30/24 18:10:08]: /sw/apps/bioinfo/GeneMark/4.57-es/rackham/gmes_petap.pl --ES --max_intron 3000 --soft_mask 2000 --cores 12 --sequence genome.query.fasta --fungus [05/30/24 18:54:52]: perl /home/faheema/bin/EVidenceModeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl fun/predict_misc/genemark.gff [05/30/24 18:54:55]: 7,344 predictions from GeneMark [05/30/24 18:54:55]: Running BUSCO to find conserved gene models for training ab-initio predictors [05/30/24 18:54:55]: /proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/bin/python /proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /crex/proj/uppstore2018147/nobackup/private/Faheema/annotation/project3/Sample_UA-2819-3/fun/predict_misc/genome.softmasked.fa -m genome --lineage /home/faheema/funannotate_db/basidiomycota -o leccinum_scabrum -c 12 --species laccaria_bicolor -f --local_augustus /crex/proj/uppstore2018147/nobackup/private/Faheema/annotation/project3/Sample_UA-2819-3/fun/predict_misc/ab_initio_parameters/augustus [05/30/24 19:13:17]: 1,059 valid BUSCO predictions found, validating protein sequences [05/30/24 19:14:49]: 1,051 BUSCO predictions validated [05/30/24 19:14:49]: Training Augustus using BUSCO gene models [05/30/24 19:14:49]: gff2gbSmallDNA.pl fun/predict_misc/busco.final.gff3 /crex/proj/uppstore2018147/nobackup/private/Faheema/annotation/project3/Sample_UA-2819-3/fun/predict_misc/genome.softmasked.fa 600 fun/predict_misc/augustus.training.busco.gb

OS/Install Information

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.50 Clone: 0.39 DBD::SQLite: 1.58 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.842 Data::Dumper: 2.167 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.51 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 1.20 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.11 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.26 local::lib: 2.000024 threads: 2.21 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/faheema/funannotate_db $PASAHOME=/proj/uppstore2018147/nobackup/private/Faheema/software/funannotate/opt/pasa-2.5.3 $TRINITY_HOME=/sw/bioinfo/trinity/2.9.1/rackham $EVM_HOME=/home/faheema/bin/EVidenceModeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/proj/uppstore2018147/nobackup/private/Faheema/augustus/augustus_config $GENEMARK_PATH=/sw/apps/bioinfo/GeneMark/4.57-es/rackham All 6 environmental variables are set

Checking external dependencies... PASA: 2.5.3 CodingQuarry: 2.0 Trinity: 2.9.1 augustus: 3.4.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.2 blat: BLAT v36 diamond: 2.0.4 emapper.py: 2.1.12 ete3: 3.1.1 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2018-07-04 gmes_petap.pl: 4.57_lic hisat2: 2.1.0 hmmscan: HMMER 3.2.1 (June 2018) hmmsearch: HMMER 3.2.1 (June 2018) java: 1.8.0_151 kallisto: 0.45.1 mafft: v7.407 (2018/Jul/23) makeblastdb: makeblastdb 2.10.1+ minimap2: 2.16-r922 pigz: 2.3.4 proteinortho: 6.3.1 pslCDnaFilter: no way to determine salmon: salmon 1.1.0 samtools: samtools 1.10 signalp: 4.1 snap: 2006-07-28 stringtie: 1.3.3 tRNAscan-SE: 1.3.1 (January 2012) tantan: tantan 49 tbl2asn: 25.3 tblastn: tblastn 2.10.1+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 37 external dependencies are installed