nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
311 stars 82 forks source link

Phobious problems #696

Open Hoberti opened 2 years ago

Hoberti commented 2 years ago

Hi! I am having some problems with phobius annotation

Are you using the latest release? funannotate v1.8.9

Describe the bug

When I am trying to annotate the proteins obtained after funnanotate update I get this error

CMD ERROR: phobius.pl -short FUN_010821-T1.fa (None, b'Phobius ver 1.01\n(c) 2004 Lukas Kall, Anders Krogh, Erik Sonnhammer\n\nCould not read provided fasta sequence at /home/phobius/phobius.pl line 408.\n')

What command did you issue? I tried these two commands

funannotate annotate -i update_results --cpus 40 --busco_db ascomycota

funannotate remote -m phobius -e XXX@ZZ -i update_results/ -o phobius --force

In both I get the same error

OS/Install Information Checking dependencies for 1.8.9

You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.1 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.7.4 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/funannotate_db $PASAHOME=/home/pasa-2.4.1 $TRINITY_HOME=/home/trinity-2.8.5 $EVM_HOME=/home/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/config/ $GENEMARK_PATH=/home/gmes_linux_64/ All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.13 emapper.py: 2.1.7 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.23-r1111 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed

Could you help me with this?

Thanks

Héctor

fmobegi commented 2 years ago

Check if you have '*' in your fasta file. That could be the issue. Alternatively, just run phobius as a standalone process and feed the results to the annotation command>

cp update_results/genome.proteins.fa . ## make a copy of your proteins file
sed -i 's/[*]//g' genome.proteins.fa ## remove any special char 
phobius.pl -short genome.proteins.fa > phobius.results.txt
funannotate annotate -i update_results --cpus 40 --busco_db ascomycota --phobius phobius.results.txt

Good luck

subachess commented 2 years ago

Hello, I have also encountered the same error. I deleted the * in the .proteins.fa file in the update_results folder as you mentioned in your comment and it worked well. Thank you very much.

aberaslop commented 2 years ago

Hi! @fmobegi, thank you so much for this solution. It works for me too!

fmobegi commented 2 years ago

@aberaslop .. Welcome. But you might want to investigate if you have in-frame or premature stop codon if any. Could be that you've sequenced a pseudogene or you just have a non-functional copy that can be removed from the annotation. Check your GFF3 file using AGAT.

aberaslop commented 2 years ago

Hi @fmobegi, thank you for your message and the suggestion! I needed phobius for a completely different pipeline. I have found the phobius issue only with genomes that had been annotated by the jgi, nothing related to funannotate, which works perfectly for the genomes that I have analyzed myself. The asterisks I found were always at the end of the proteins, but your solution really helped, and the pipeline keeps running after the fix. Thanks again!