nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Funannotate annotate - issue with Interproscan and tbl2asn #968

Closed Laura-CE-Campbell closed 8 months ago

Laura-CE-Campbell commented 8 months ago

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here. Yes, I am using version 1.8.16 Describe the bug When using Funannotate annotate I cannot incorporate my results from Interproscan to my final annotation.

When I read in Interproscan using --iprscan Funannotate does not finish, whereas when I call Funannotate in the simpler way (see code below) annotate finishes successfully, but does not incorporate Interproscan.

What command did you issue? First I tried:

funannotate annotate --gff 41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.gff3 \ --fasta /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta \ --sbt 41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.tbl \ --genbank /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.gbk \ -s Philidris_41 --iprscan /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results/iprscan.xml \ --cpus 32 --out funannotate41_annotate

Then I tried:

funannotate annotate -i /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results --cpus 32

Logfiles Please provide relavent log files of the error.

Log file 1:

[Oct 10 03:35 PM]: OS: Rocky Linux 8.6, 128 cores, ~ 263 GB RAM. Python: 3.8.15 [Oct 10 03:35 PM]: Running 1.8.15 [Oct 10 03:35 PM]: Checking GenBank file for annotation [Oct 10 03:36 PM]: Adding Functional Annotation to Philidris_41, NCBI accession: None [Oct 10 03:36 PM]: Annotation consists of: 25,692 gene models [Oct 10 03:36 PM]: 25,589 protein records loaded [Oct 10 03:36 PM]: Running HMMer search of PFAM version 35.0 [Oct 10 03:39 PM]: 14,291 annotations added [Oct 10 03:39 PM]: Running Diamond blastp search of UniProt DB version 2023_02 [Oct 10 03:39 PM]: 957 valid gene/product annotations from 1,694 total [Oct 10 03:39 PM]: Install eggnog-mapper or use webserver to improve functional annotation: https://github.com/jhcepas/eggnog-mapper [Oct 10 03:39 PM]: No Eggnog-mapper results found. [Oct 10 03:39 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.88 [Oct 10 03:39 PM]: 957 gene name and product description annotations added [Oct 10 03:39 PM]: Running Diamond blastp search of MEROPS version 12.0 [Oct 10 03:39 PM]: 510 annotations added [Oct 10 03:39 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0 [Oct 10 03:41 PM]: 199 annotations added [Oct 10 03:41 PM]: Annotating proteins with BUSCO dikarya models [Oct 10 03:43 PM]: 736 annotations added [Oct 10 03:43 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Oct 10 03:43 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Oct 10 03:43 PM]: 0 secretome and 0 transmembane annotations added [Oct 10 03:43 PM]: Parsing InterProScan5 XML file [Oct 10 03:44 PM]: Found 0 duplicated annotations, adding 90,917 valid annotations [Oct 10 03:44 PM]: Converting to final Genbank format, good luck! error: [Errno 2] No such file or directory: 'tbl2asn' run(*(['tbl2asn', '-y', '"Annotated using Funannotate 1.8.15"', '-N', '1', '-t', '41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.tbl', '-M', 'n', '-j', '"[organism=Philidris_41]"', '-V', 'b', '-c', 'f', '-T', '-a', 'r10u', '-l', 'paired-ends'], 'funannotate41_annotate/annotate_misc/tbl2asn/2'), *{}) error: [Errno 2] No such file or directory: 'tbl2asn' run((['tbl2asn', '-y', '"Annotated using Funannotate 1.8.15"', '-N', '1', '-t', '41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.tbl', '-M', 'n', '-j', '"[organism=Philidris_41]"', '-V', 'b', '-c', 'f', '-T', '-a', 'r10u', '-l', 'paired-ends'], 'funannotate41_annotate/annotate_misc/tbl2asn/1'), **{}) [Oct 10 03:44 PM]: ERROR: GBK file conversion failed, tbl2asn parallel script has died

Log file 2:

[Oct 17 11:44 AM]: OS: Rocky Linux 8.8, 128 cores, ~ 263 GB RAM. Python: 3.8.15 [Oct 17 11:44 AM]: Running 1.8.16 [Oct 17 11:44 AM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Oct 17 11:44 AM]: Parsing input files [Oct 17 11:44 AM]: Existing tbl found: /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.tbl [Oct 17 11:45 AM]: Adding Functional Annotation to 41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta, NCBI accession: None [Oct 17 11:45 AM]: Annotation consists of: 25,692 gene models [Oct 17 11:45 AM]: 25,589 protein records loaded [Oct 17 11:45 AM]: Running HMMer search of PFAM version 35.0 [Oct 17 11:47 AM]: 14,291 annotations added [Oct 17 11:47 AM]: Running Diamond blastp search of UniProt DB version 2023_02 [Oct 17 11:47 AM]: 957 valid gene/product annotations from 1,694 total [Oct 17 11:47 AM]: Running Eggnog-mapper [Oct 17 12:07 PM]: Parsing EggNog Annotations [Oct 17 12:07 PM]: EggNog version parsed as 2.1.11 [Oct 17 12:07 PM]: 27,149 COG and EggNog annotations added [Oct 17 12:07 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.88 [Oct 17 12:07 PM]: 6,384 gene name and product description annotations added [Oct 17 12:07 PM]: Running Diamond blastp search of MEROPS version 12.0 [Oct 17 12:07 PM]: 510 annotations added [Oct 17 12:07 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0 [Oct 17 12:08 PM]: 199 annotations added [Oct 17 12:08 PM]: Annotating proteins with BUSCO dikarya models [Oct 17 12:08 PM]: 736 annotations added [Oct 17 12:08 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Oct 17 12:08 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Oct 17 12:08 PM]: 0 secretome and 0 transmembane annotations added [Oct 17 12:08 PM]: InterProScan error, /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/annotate_misc/iprscan.xml is empty, or no XML file passed via --iprscan. Functional annotation will be lacking. [Oct 17 12:08 PM]: Found 0 duplicated annotations, adding 55,653 valid annotations [Oct 17 12:08 PM]: Converting to final Genbank format, good luck! [Oct 17 12:10 PM]: Creating AGP file and corresponding contigs file [Oct 17 12:10 PM]: Writing genome annotation table. [Oct 17 12:11 PM]: Funannotate annotate has completed successfully!

    We need YOUR help to improve gene names/product descriptions:
       0 gene/products names MUST be fixed, see /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/annotate_results/Gene2Products.must-fix.txt
       31 gene/product names need to be curated, see /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/annotate_results/Gene2Products.need-curating.txt
       802 gene/product names passed but are not in Database, see /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/annotate_results/Gene2Products.new-names-passed.txt

    Please consider contributing a PR at https://github.com/nextgenusfs/gene2product


OS/Install Information


Checking dependencies for 1.8.16

You are running Python v 3.8.15. Now checking python packages... biopython: 1.81 goatools: 1.3.9 matplotlib: 3.4.3 natsort: 8.4.0 numpy: 1.24.4 pandas: 1.5.3 psutil: 5.9.5 requests: 2.31.0 scikit-learn: 1.3.1 scipy: 1.10.1 seaborn: 0.13.0 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.858 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.17 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/nobackup/lxrj61/lxrj61/funannotate_db/ $PASAHOME=/nobackup/lxrj61/lxrj61/anaconda3/envs/funannotate/opt/pasa-2.5.3 $TRINITY_HOME=/nobackup/lxrj61/lxrj61/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/nobackup/lxrj61/lxrj61/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/nobackup/lxrj61/lxrj61/anaconda3/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.5.3 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.31.0 blat: BLAT v37x1 diamond: 2.1.8 emapper.py: 2.1.11 ete3: 3.1.3 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2023-10-01 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.520 (2023/Mar/22) makeblastdb: makeblastdb 2.13.0+ minimap2: 2.26-r1175 pigz: 2.6 proteinortho: 6.3.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.17 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.12 (Nov 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.13.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: gmes_petap.pl not installed ERROR: signalp not installed

(I was having issues with GeneMark and my understanding is that you can use Funannotate without it - please let me know if I should have resolved this)

Thank you in advance for any help!

Laura

hyphaltip commented 8 months ago

I have your copy of the xml I'll see if I can reproduce the parsing error.

Laura-CE-Campbell commented 8 months ago

Thank you for looking into this Jason - did you find the same error?

hyphaltip commented 8 months ago

there's no error on my end are you sure this is failing - is there not a annotations.iprscan.txt file in annotate_misc? you can run things outside of funannotate and just copy it in I guess? without an error I don't know what to point to.

git clone https://github.com/nextgenusfs/funannotate
python funannotate/funannotate/aux_scripts/iprscan2annotations.py interpro.xml interpro.txt
cp interpro.txt YOURFOLDER/annotate_misc/annotations.iprscan.txt 
Laura-CE-Campbell commented 8 months ago

Thank you very much for getting back to me.

There is no annotation.iprscan.txt file (see screenshot of what I get) image

(This is when I run in the more simple way).

The interproscan file is not empty:

image

I am currently using Funannotate which I downloaded using Mamba - would re-downloading it using Git Clone make a difference?

Thank you and best wishes,

Laura

hyphaltip commented 8 months ago

I'm really confused about your folder you seem to not really be running this in the way I'd expect. there are files in predict_results which seem totally out of place. iprscan.xml should be located in annotate_misc folder

Is this how you are running? this seems wrong unfortunately.

funannotate annotate --gff 41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.gff3
--fasta /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta
--sbt 41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.tbl
--genbank /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG.fasta.gbk
-s Philidris_41 --iprscan /nobackup/lxrj61/lxrj61/Illumina_annotations/41_Philidris_sp_KfmA85-813_Thailand_Yes_AG/_fun_out/predict_results/iprscan.xml
--cpus 32 --out funannotate41_annotate
hyphaltip commented 8 months ago

--sbt - this is a file from running create sbt at NCBI https://www.ncbi.nlm.nih.gov/genbank/table2asn/#Template save that file as SBTFILE and use below: here's how we run it

funannotate predict -i GENOME -o RESULTS --busco_db BUSCONAME -s "GENUS SPECIES" --name LOCUSTAG
funannotate annotate -i RESULTS  --sbt SBTFILE --busco_db BUSCONAME  --species "GENUS SPECIES" --strain STRAIN --cpus $CPUS --iprscan IPRSCANXML --cpus CPUS 

you could also copy iprscan.xml into RESULTS/annotate_misc/ and it would be picked up automatically. See what you all.annotations.txt file looks like in annoate_misc when it is done running and if there is a annotations.iprscan.txt file in there.

Laura-CE-Campbell commented 8 months ago

Hi Jason,

Ah! Okay - the sbt file seems to have been the problem here - just ran Funannotate annotate successfully there!

Thank you very much for you help with this.

hyphaltip commented 8 months ago

good - glad you solved it.