nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
319 stars 83 forks source link

funannotate annotate trouble parsing iprscan xml file #832

Open cdixo opened 1 year ago

cdixo commented 1 year ago

Are you using the latest release? Using newest version to my knowledge

Describe the bug When running 'funannotate annotate' "Parsing InterProScan5 XML file" returns as error "ValueError: too many values to unpack (expected 2)"

What command did you issue? [FYI - using 'Genus species' in lieu of true organism name]

funannotate annotate \
--cpus 26 \
--iprscan /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/FA_interproscan_submission17.2_output/Genus_species.proteins.fa.xml \
--input /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2 \
--fasta /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/species.V1.masked.fasta \
--species "Genus species" \
--busco_db /fs/project/PAS1444/databases/funannotate/embryophyta \
--out /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/FA_funannotate_annotate_output_10_4_22

Logfiles

Log File 1:

[Oct 28 12:56 PM]: Running 1.8.14 [Oct 28 12:56 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Oct 28 12:56 PM]: Found existing output directory /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2. Warning, will re-use any intermediate files found. [Oct 28 12:56 PM]: Parsing input files [Oct 28 12:56 PM]: Existing tbl found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/update_results/Genus_species.tbl [Oct 28 12:58 PM]: Adding Functional Annotation to Genus species, NCBI accession: None [Oct 28 12:58 PM]: Annotation consists of: 37,444 gene models [Oct 28 12:58 PM]: 40,277 protein records loaded [Oct 28 12:58 PM]: Existing Pfam-A results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.pfam.txt [Oct 28 12:58 PM]: 4,964 annotations added [Oct 28 12:58 PM]: Running Diamond blastp search of UniProt DB version 2022_03 [Oct 28 12:58 PM]: 6,788 valid gene/product annotations from 10,176 total [Oct 28 12:58 PM]: Install eggnog-mapper or use webserver to improve functional annotation: https://github.com/jhcepas/eggnog-mapper [Oct 28 12:58 PM]: No Eggnog-mapper results found. [Oct 28 12:58 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.81 [Oct 28 12:58 PM]: 6,787 gene name and product description annotations added [Oct 28 12:58 PM]: Existing MEROPS results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.merops.txt [Oct 28 12:58 PM]: 1,445 annotations added [Oct 28 12:58 PM]: Existing CAZYme results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.dbCAN.txt [Oct 28 12:58 PM]: 1,364 annotations added [Oct 28 12:58 PM]: Existing BUSCO2 results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.busco.txt [Oct 28 12:58 PM]: 1,513 annotations added [Oct 28 12:58 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Oct 28 12:58 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Oct 28 12:58 PM]: 0 secretome and 0 transmembane annotations added [Oct 28 12:59 PM]: Parsing InterProScan5 XML file

Traceback (most recent call last): File "/opt/conda/bin/funannotate", line 688, in main() File "/opt/conda/bin/funannotate", line 678, in main mod.main(arguments) File "/users/PAS1444/cullendixon/.local/lib/python3.7/site-packages/funannotate/annotate.py", line 1118, in main GeneNames = lib.getGeneBasename(Proteins) File "/users/PAS1444/cullendixon/.local/lib/python3.7/site-packages/funannotate/library.py", line 953, in getGeneBasename transcript, gene = line.split(' ') ValueError: too many values to unpack (expected 2)

Log file 2:

[10/28/22 12:56:41]: /opt/conda/bin/funannotate annotate --cpus 26 --iprscan /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/FA_interproscan_submission17.2_output/Genus_species.proteins.fa.xml --input /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2 --fasta /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/labrusca.V1.masked.fasta --species Genus species --busco_db /fs/project/PAS1444/databases/funannotate/embryophyta --out /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/FA_funannotate_annotate_output_10_4_22

[10/28/22 12:56:41]: OS: Debian GNU/Linux 10, 40 cores, ~ 197 GB RAM. Python: 3.7.6 [10/28/22 12:56:41]: Running 1.8.14 [10/28/22 12:56:41]: hmmscan version=HMMER 3.3.1 (Jul 2020) path=/opt/conda/bin/hmmscan [10/28/22 12:56:41]: hmmsearch version=HMMER 3.3.1 (Jul 2020) path=/opt/conda/bin/hmmsearch [10/28/22 12:56:41]: diamond version=2.0.4 path=/opt/conda/bin/diamond [10/28/22 12:56:42]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [10/28/22 12:56:42]: Found existing output directory /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2. Warning, will re-use any intermediate files found. [10/28/22 12:56:42]: Parsing input files [10/28/22 12:56:42]: Existing tbl found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/update_results/Genus_species.tbl [10/28/22 12:57:13]: TBL file: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/genome.tbl [10/28/22 12:57:13]: GFF3 file: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/update_results/Genus_species.gff3 [10/28/22 12:57:13]: Proteins file: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/genome.proteins.fasta [10/28/22 12:58:45]: Adding Functional Annotation to Genus species, NCBI accession: None [10/28/22 12:58:45]: Annotation consists of: 37,444 gene models [10/28/22 12:58:45]: 40,277 protein records loaded [10/28/22 12:58:47]: Existing Pfam-A results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.pfam.txt [10/28/22 12:58:47]: 4,964 annotations added [10/28/22 12:58:47]: Running Diamond blastp search of UniProt DB version 2022_03 [10/28/22 12:58:53]: 6,788 valid gene/product annotations from 10,176 total [10/28/22 12:58:53]: Install eggnog-mapper or use webserver to improve functional annotation: https://github.com/jhcepas/eggnog-mapper [10/28/22 12:58:53]: No Eggnog-mapper results found. [10/28/22 12:58:53]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.81 [10/28/22 12:58:54]: 6,787 gene name and product description annotations added [10/28/22 12:58:54]: Existing MEROPS results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.merops.txt [10/28/22 12:58:54]: 1,445 annotations added [10/28/22 12:58:54]: Existing CAZYme results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.dbCAN.txt [10/28/22 12:58:54]: 1,364 annotations added [10/28/22 12:58:54]: Existing BUSCO2 results found: /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.busco.txt [10/28/22 12:58:54]: 1,513 annotations added [10/28/22 12:58:54]: Skipping phobius predictions, try funannotate remote -m phobius [10/28/22 12:58:54]: Skipping secretome: neither SignalP nor Phobius searches were run [10/28/22 12:58:54]: 0 secretome and 0 transmembane annotations added [10/28/22 12:59:00]: Parsing InterProScan5 XML file [10/28/22 12:59:00]: /opt/conda/bin/python /users/PAS1444/cullendixon/.local/lib/python3.7/site-packages/funannotate/aux_scripts/iprscan2annotations.py /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/iprscan.xml /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/functionalannotation/practice/9_25_22_ATR_box2/annotate_misc/annotations.iprscan.txt

OS/Install Information

Checking dependencies for 1.8.14

You are running Python v 3.7.6. Now checking python packages... biopython: 1.78 goatools: 1.0.6 matplotlib: 3.2.1 natsort: 7.0.1 numpy: 1.19.1 pandas: 1.1.2 psutil: 5.7.2 requests: 2.22.0 scikit-learn: 0.23.2 scipy: 1.5.2 seaborn: 0.11.0 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.852 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000029 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/fs/project/PAS1444/databases/funannotate/ $PASAHOME=/opt/conda/opt/pasa-2.4.1 $TRINITY_HOME=/opt/conda/opt/trinity-2.8.5 $EVM_HOME=/opt/conda/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/fs/project/PAS1444/databases/augustus/config/ $GENEMARK_PATH=/fs/project/PAS1444/software/gmes_linux_64_4/ All 6 environmental variables are set

Checking external dependencies... Traceback (most recent call last): File "/opt/conda/bin/ete3", line 6, in from ete3.tools.ete import main File "/opt/conda/lib/python3.7/site-packages/ete3/tools/ete.py", line 55, in from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view, File "/opt/conda/lib/python3.7/site-packages/ete3/tools/ete_view.py", line 48, in from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace, ImportError: cannot import name 'TextFace' from 'ete3' (/opt/conda/lib/python3.7/site-packages/ete3/init.py) PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.2 blat: BLAT v36 diamond: 2.0.4 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.1 (Jul 2020) hmmsearch: HMMER 3.3.1 (Jul 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.471 (2020/Jul/3) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.22 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.7 snap: 2006-07-28 stringtie: 2.1.2 tRNAscan-SE: 2.0.6 (May 2020) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: pigz not installed ERROR: signalp not installed

Thank you for your help in advance!

hyphaltip commented 1 year ago

Did you already run predict step? Why are you passing in genome again to annotate step? You just need the top folder you used for output in that step. Are there definitely genes predicted in previous step?

hyphaltip commented 1 year ago

And your prefix for the previous run had an underscore - Eg your protein gene names are PREF_1234 format right?

cdixo commented 1 year ago
  1. "Did you already run predict step?" Not exactly - I am attempting to update a pre-existing gene annotation (.gff3 file) with transcriptome information to make the annotation more accurate using actual transcripts, therefore, I did the following - funannotate update > funannotate iprscan (did not work, therefore, I ran interproscan independently and then fed the Genus_species.proteins.fa.xml file into...) > funannotate annotate to add the annotations to the .gff3 file using the option '--iprscan' followed by the location of the .xml file.

  2. "Why are you passing in genome again to annotate step? You just need the top folder you used for output in that step." The funannotate annotate command as described here (https://funannotate.readthedocs.io/en/latest/commands.html) appears to dictate that '--fasta' is a mandatory option and it needs a "Genome in multi-fasta format". As '--input', I am currently dictating the folder that contains all of the output from the 'funannotate update' step (folders of annotate_misc, annotate_results, update_results, update_misc, logfiles)

  3. "Are there definitely genes predicted in previous step?” Yes, I've listed the first few lines from the .tsv output file from the run of interproscan

maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 Gene3D  G3DSA:3.80.10.10    Ribonuclease Inhibitor  314 478 1.4E-40 T   07-10-2022  IPR032675   Leucine-rich repeat domain superfamily  -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    343 370 5.1E-6  T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    259 286 0.0017  T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    512 539 0.88    T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    372 399 25.0    T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    456 483 1.3E-4  T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    428 455 0.35    T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    231 258 0.0017  T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    287 314 76.0    T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    484 511 8.3 T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    541 568 0.11    T   07-10-2022  -   -
maker-1-augustus-gene-97.24-T1  4cf656117b49bef5d33507f4b960a191    611 SMART   SM00368 LRR_RI_2    400 427 3.2E-4  T   07-10-2022  -   -

Here are a few lines from the .xml file too

<?xml version="1.0" encoding="UTF-8"?><protein-matches xmlns="http://www.ebi.ac.uk/interpro/resources/schemas/interproscan5" interproscan-version="5.57-90.0">
  <protein>
    <sequence md5="4cf656117b49bef5d33507f4b960a191">MASTSTSAICLYSHPMISHRARPLNLRSQVLGSSWWGYCGLLPTSVYVSSIRHYRFRTLVTVAASTADGVPRRPVSGRRVFKQSQGQGPLSPVPVREIASFVVPASLFFAVTFVLWRLVEKILLPKSSRSSSLEKKSSSPGVKWSFAPGTNLLAGLTAKFDRESKQKLNEFAKEIRSFGSVDMSGRNFGDEGLFFLAESLAYNQNAEEVSFAANGITAAGLKAFDGVLQSNIVLKTLDLSGNPIGDEGAKCLCDILIDNAGIQKLQLNSADLGDEGAKAIAEMLKKNSSLRIVELNNNMIDYSGFTSLGGALLENNTIRNIHLNGNYGGALGVAALAKGLEANKSLRELHLHGNSIGDEGVRVLMSGLSSHKGKLTLLDIGNNEISSRGAFHVAEYIKKAKSLLWLNLYMNDIGDEGAEKIADALKENRSIATIDLGGNNIHAKGVSKIAGVLKDNTVITTLELGYNPIGPEGAKALSEVLKFHGKIKTLKLGWCQIGAKGAEFIADTLKYNTTISTLDLRANGLRDEGAVCLARSMKVVNEALASLDLGFNEIRDEGAFAIAQALKANEDVAVTSLNLASNFLTKFGQSALTDARDHVYEMSEKEVNIFF</sequence>
    <xref id="maker-1-augustus-gene-97.24-T1" name="maker-1-augustus-gene-97.24-T1 maker-1-augustus-gene-97.24"/>
    <matches>
      <hmmer2-match evalue="1.5E-69" score="247.0">
        <signature ac="SM00368" name="LRR_RI_2">
          <signature-library-release library="SMART" version="7.1"/>
        </signature>
        <model-ac>SM00368</model-ac>
        <locations>
          <hmmer2-location score="36.0" evalue="5.1E-6" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="343" end="370">
            <location-fragments>
              <hmmer2-location-fragment start="343" end="370" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="27.6" evalue="0.0017" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="259" end="286">
            <location-fragments>
              <hmmer2-location-fragment start="259" end="286" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="17.5" evalue="0.88" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="512" end="539">
            <location-fragments>
              <hmmer2-location-fragment start="512" end="539" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="7.2" evalue="25.0" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="372" end="399">
            <location-fragments>
              <hmmer2-location-fragment start="372" end="399" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="31.3" evalue="1.3E-4" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="456" end="483">
            <location-fragments>
              <hmmer2-location-fragment start="456" end="483" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="19.9" evalue="0.35" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="428" end="455">
            <location-fragments>
              <hmmer2-location-fragment start="428" end="455" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="27.6" evalue="0.0017" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="231" end="258">
            <location-fragments>
              <hmmer2-location-fragment start="231" end="258" dc-status="CONTINUOUS"/>
            </location-fragments>
          </hmmer2-location>
          <hmmer2-location score="3.9" evalue="76.0" hmm-start="1" hmm-end="28" hmm-length="28" hmm-bounds="COMPLETE" start="287" end="314">
            <location-fragments>
              <hmmer2-location-fragment start="287" end="314" dc-status="CONTINUOUS"/>
            </location-fragments>

Can you spot any issues here? I can't say that the .xml format is very easily understandable, at least to me not being familiar with it.

  1. “And your prefix for the previous run had an underscore - Eg your protein gene names are PREF_1234 format right?” I am not sure if I am quite understanding your question. The Genus_species.protein.fa file found in the 'update_results' folder from the funannotate update step has contents that looks like the below...
>augustus-1-processed-gene-0.9-T1 augustus-1-processed-gene-0.9
MGRTGARLPSFCLNRIRPLVRVRSPSIQSKPDANSIKTDQKTENSPSVGEENAKAGLIIGRRIMIVVDSSVEAKGALQWA
LSHTVQSQDTLILLYVTKPCKQGEECGKEVAPRVYELLYSMKNVCQLKRPEVEIEVAVVEGKEKGPTIVEEAKKRGVALL
VLGQRKRSMTWRLVMMWAVNRVGGGVVEYCIQNADCMAIAVRRKSKKGGGYLITTKRHKDFWLLA
>augustus-1-processed-gene-1.0-T1 augustus-1-processed-gene-1.0
MGRGRVQLKRIENKINRQVTFSKRRTGLLKKAHEISVLCDAEVALIVFSTKGKLFEYSTDSWYASYVSSSSSILLLLPL
>augustus-1-processed-gene-1.4-T1 augustus-1-processed-gene-1.4
MSVAALSEADKIYKKSFHRRNDSGELDVFEAARYFSGGNEIIGYNGAAFPQRMMMREERQGWRGGRISLDMPMRSSLPTQ
SSHAVEKQMKEKIKYKQPSSPGGRLASFLNSLFNQTNSKKKKSKSTAQSIKDEEESPGGRRKRRSSISHFRSSSTADSKS
VYSSSSSGFRTPPPYANTPTKTYKDLRSYSDHRQVVSLPNYNNGNVKATGLRNEALDEKRIKELVWLDEKFKFSSGFSEK
HKNFSNGLSEKDRIWVDEYPSEEKEFRKLDEIDAGAESDSSSDLFELQNYDLGCYSSGLPVYETTHMDSIKRGAPISNGP
LPL

Did that answer your question for question #4?

Ultimately, I am trying to add the predicted functions, as predicted by interproscan, to the .gff3 gene annotation file so I can see the predicted function of noted genes when identified as DEGs down the line so I have an idea what processes these DEGs are implicated in.

Thank you very much!

nextgenusfs commented 1 year ago

These locus_tag names are unsupported (augustus-1-processed-gene-0.9-T1). You can first run predict and pass your maker_gff as input which will filter those predictions and change the name to something parsable. Then run update and then run annotate.

cdixo commented 1 year ago

Thank you for your response and suspected resolution to the issue. Previously, unfortunately, I was never able to run funannotate predict successfully.

I would provide it this command:

funannotate predict \
-input /fs/project/PAS1444/GeneAnnoWorkDir/funannotate/species.V1.masked.fasta \
-species "Genus_species_9_28_22" \
--cpus 26 \
--organism other \
--max_intronlen 5000 \
--protein_evidence /fs/project/PAS1444/databases/funannotate/uniprot_sprot.fasta \
--busco_seed_species BUSCO_species_prelim_1711742966 --optimize_augustus \
--busco_db /fs/project/PAS1444/GeneAnnoWorkDir/maker/1stRoundQualityScores/embryophyta_odb10 \
-output /fs/scratch/PAS1444/funannotateoutput/9_28_22_step_4_box2

And then receive the following error after roughly 4 hours of running:

Computing alignments... terminate called after throwing an instance of 'std::runtime_error'
  what():  Traceback error.

Traceback (most recent call last):
  File "/opt/conda/bin/funannotate", line 688, in <module>
    main()
  File "/opt/conda/bin/funannotate", line 678, in main
    mod.main(arguments)
  File "/users/PAS1444/cullendixon/.local/lib/python3.7/site-packages/funannotate/predict.py", line 1057, in main
    lib.exonerate2hints(Exonerate, hintsP)
  File "/users/PAS1444/cullendixon/.local/lib/python3.7/site-packages/funannotate/library.py", line 3990, in exonerate2hints
    with open(file, 'r') as input:
FileNotFoundError: [Errno 2] No such file or directory: '/fs/scratch/PAS1444/funannotateoutput/9_16_22_step_4_box2/predict_misc/protein_alignments.gff3'

Indeed, 'protein_alignments.gff3' does not exist there, all that does exist there is the following... proteins.combined.fa repeatmasker.bed genome.softmasked.fa assembly-gaps.bed scaffold.sort.rename.txt scaffold.sort.order.txt ab_initio_parameters (a directory)

The thing is, I am not sure why it is even writing in the 'fs/scratch/...' location because I don't even dictate that location in the command (except for where to put the output, however, you can see the two filepaths are different still).

Why might I be encountering this issue?

Thank you.