nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
323 stars 87 forks source link

EC_number, eggnog and COG terms not in annotate_results/annotations.txt #1026

Open slsy9965 opened 7 months ago

slsy9965 commented 7 months ago

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here. yes, installed via conda

Describe the bug A clear and concise description of what the bug is. I ran funnannotate annotate with --eggnog flag, providing my own eggnog annotation (gff) file produced from running eggnogg independently from funannotate. When checking annotate_misc/annotation.eggnog.txt, it seems to be parsed fine. but the resulting annotate_results/annotations.txt file shows empty EC number, Eggnog and COG columns.

Was wondering what could've went wrong that the information weren't included in the final annotation output?

What command did you issue? Copy/paste the command used. for Eggnog: emapper.py -m diamond --cpu 24 --itype genome .../funannotate/predict_results/Pseudogymonascus.scaffolds.fasta --output AKSP4_eggnog --output_dir .../eggnog

for funannotate annotate: funannotate annotate -i .../AKSP4_predicted -o .../AKSP4_annotated --iprscan .../Pseudogymonascus.proteins.fa.xml --antismash .../AKSP4_antismash.gbk --eggnog .../AKSP4_eggnog.emapper.annotations

Logfiles Please provide relavent log files of the error.

output generated by emapper.py AKSP4_eggnog.emapper.annotations.txt

for annotate_misc/annotation.egg.txt annotations.eggnog.txt

for annotate_results/Pseudogymnoascus.annotations.txt (a snippet of it) Pseudogymnoascus.annotations.snippet.txt (shows empty EC numbers Eggnog and COG boxes)

logfile for funning funannotate annotate AKSP4funannotate_log.txt

OS/Install Information

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.38 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.858 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $PASAHOME=/home/user/slsy9965/funannotate/opt/pasa-2.5.3 $TRINITY_HOME=/home/user/slsy9965/funannotate/opt/trinity-2.8.5 $EVM_HOME=/home/user/slsy9965/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/user/slsy9965/funannotate/config/ ERROR: FUNANNOTATE_DB not set. export FUNANNOTATE_DB=/path/to/dir ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.5.3 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.31.0 blat: BLAT v37x1 diamond: 2.1.8 ete3: 3.1.3 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2023-07-20 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.520 (2023/Mar/22) makeblastdb: makeblastdb 2.14.1+ minimap2: 2.26-r1175 pigz: 2.6 proteinortho: 6.3.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.18 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.12 (Nov 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.14.1+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

IanDMedeiros commented 6 months ago

@slsy9965 It looks like your eggnog annotations are using a different format for gene identifiers (e.g., AKSP4_contig_1_104). I'm guessing that is why they are getting parsed to annotations.eggnog but not included in the file with all annotations.

GSS-Investigator commented 5 days ago

Ian, I'm having a very similar issue, although I have somewhat different names. Could you go into a bit more detail? What exactly is the expected format for the gene identifiers?

IanDMedeiros commented 5 days ago

The expected format is <LOCUS TAG PREFIX>_<LOCUS NUMBER>.