Closed Rob-murphys closed 3 years ago
This is potentially an augustus
related error -- although I'm not sure it is yielding the unicode decode error if you are using python3. What version of augustus
do you have installed and what OS are you on? What it is checking here is for a functional proteinprofile
mode of augustus
, that feature/method in the augustus
code seems to have various compilation issues on different operating systems. You can test it manually with:
augustus --species=anidulans /path/to/your/env/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl \
/path/to/your/env/lib/python3.6/site-packages/funannotate/config/busco_test.fa
Augustus version
v3.3.3
OS
Lunix kernel version 3.10.0-957.1.3.el7.x86_64
Python version
Python 3.6.10 :: Anaconda, Inc.
It seems I am missing this test data in my config directory:
ls /services/tools/funannotate/1.8.3/config/
cgp/ extrinsic/ model/ profile/ species/
Those files are in the funannotate python directory not Augustus config.
Ah okay, it does not like me providing two query files:
augustus --species=anidulans /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
augustus: ERROR
Error: 2 query files given: /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa and /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl.
parameter names must start with '--'
I assume I am missing flags but am not sure which file is what.
Oh, I'm sorry, forgot the --proteinprofile=
,
augustus --species=anidulans --proteinprofile=/services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
Here is the output:
(base) [robmur@g-12-l0002 scripts]$ augustus --species=anidulans --proteinprofile=/services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
# This output was generated with AUGUSTUS (version 3.4.0).
# AUGUSTUS is a gene prediction tool written by M. Stanke (mario.stanke@uni-greifswald.de),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Sources of extrinsic information: M RM
# Initializing the parameters using config directory /services/tools/augustus/3.4.0/config/ ...
Warning: Block no.unknown_E is not significant enough, removed from profile.
Warning: Block no.unknown_F is not significant enough, removed from profile.
Warning: Block no.unknown_H is not significant enough, removed from profile.
Warning: Block no.unknown_AC is not significant enough, removed from profile.
# Using protein profile unknown
# --[0..117]--> unknown_A (9) <--[2..25]--> unknown_B (27) <--[1..16]--> unknown_C (8) <--[0..1]--> unknown_D (15) <--[18..100]--> unknown_G (19) <--[8..25]--> unknown_I (32) <--[0..1]--> unknown_J (33) <--[1..16]--> unknown_K (38) <--[1..3]--> unknown_L (14) <--[0..5]--> unknown_M (59) <--[0..19]--> unknown_N (23) <--[0..145]--> unknown_O (23) <--[3..18]--> unknown_P (27) <--[1..44]--> unknown_Q (12) <--[10..82]--> unknown_R (13) <--[10..106]--> unknown_S (18) <--[1..11]--> unknown_T (32) <--[2..5]--> unknown_U (12) <--[0..1]--> unknown_V (32) <--[7..18]--> unknown_W (13) <--[3..8]--> unknown_X (87) <--[0..1]--> unknown_Y (12) <--[2..33]--> unknown_Z (40) <--[0..11]--> unknown_AA (16) <--[3..30]--> unknown_AB (19) <--[8..47]--> unknown_AD (23) <--[0..1]--> unknown_AE (13) <--[0..38]--
# anidulans version. Using default transition matrix.
# Looks like /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 3801, name = example) -----
#
# Predicted genes for sequence number 1 on both strands
# start gene g1
example AUGUSTUS gene 788 3077 0.96 + . g1
example AUGUSTUS transcript 788 3077 0.96 + . g1.t1
example AUGUSTUS start_codon 788 790 . + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 788 996 1 + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 1049 3077 0.96 + 1 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS stop_codon 3075 3077 . + 0 transcript_id "g1.t1"; gene_id "g1";
# protein sequence = [MDISDLIEPPQKRLKTEDISSADEVVLPAGGITPQTDNEIDEQLSKEIEVGITEFVSADNEGFAGILKKRYTDFLVNE
# ILPSGKVLHLTNTTAPNTNDEATPVQADKKPAEDKPKEPETPAEKLPAPVEFQLAEEDEALLDTLFGTQNTKKIVALHKKALANPKTKPSDLGRLNTV
# VVNDRDQRIKMHQAIRRIFNSQIESSTDSEGMMVISVAANRNKKNPQGGGGGRERPRVNWDELGGQYLHFTIYKENKDTMEVISFIARQLKMNPKSFQ
# FAGTKDRRGVTVQRACAYRLQADRLAKLNRTLRNAVVGDFEYQPHGLELGDLYGNEFVVTLRECEVPGINIQDPASAVAKTKELVNTSLKNLYQRGYF
# NYYGLQRFGSFATRTDTVGVKILQDDFKGACDAILDYSPHILAAAQAELGQGEGEGATPTNISSEDKARALAIHIFRTTDRVTDALEKMPRKFSAESN
# IIRHLGRSKNDYLGALQTIPRNLRLMYVHAYQSLVWNLAVGERWRLYGDRVVEGDLVLIHEHRDKDGNSSYTTPAPGAGASGETTTIDADGEIIIVPQ
# EHDSAFAVEDTFTRARALTAAEANSGLYSIFDIVLPLPGFDVLYPPNKMTDFYKEFMGSSRGGGLDPFNMRRKWKDASLSGSYRKVLSRMGRDYSVDV
# VLYSRDEEQFVRTDLENLTLKTRDGGDVDLEKKEGKSEGDKLAVVLKFQLGSSQYATMALRELMRGKVKAYKPDFGGGR]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/2
# CDS introns: 0/1
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 0
# end gene g1
###
# command line:
# augustus --species=anidulans --proteinprofile=/services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
Okay, well looks like it is working, but v3.4 won't work currently with funannotate and the BUSCO mediated generation of training models.
# This output was generated with AUGUSTUS (version 3.4.0).
Ah apart from what I just showed you now, all runs have been on Augustus v3.3.3
Here is the output on v3.3.3:
(base) [robmur@g-12-l0002 ~]$ augustus --species=anidulans --proteinprofile=/services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
# This output was generated with AUGUSTUS (version 3.3.3).
# AUGUSTUS is a gene prediction tool written by M. Stanke (mario.stanke@uni-greifswald.de),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initializing the parameters using config directory /services/tools/funannotate/1.8.3/config/ ...
Warning: Block unknown_E is not significant enough, removed from profile.
Warning: Block unknown_F is not significant enough, removed from profile.
Warning: Block unknown_H is not significant enough, removed from profile.
Warning: Block unknown_AC is not significant enough, removed from profile.
# Using protein profile unknown
# --[0..117]--> unknown_A (9) <--[2..25]--> unknown_B (27) <--[1..16]--> unknown_C (8) <--[0..1]--> unknown_D (15) <--[18..100]--> unknown_G (19) <--[8..25]--> unknown_I (32) <--[0..1]--> unknown_J (33) <--[1..16]--> unknown_K (38) <--[1..3]--> unknown_L (14) <--[0..5]--> unknown_M (59) <--[0..19]--> unknown_N (23) <--[0..145]--> unknown_O (23) <--[3..18]--> unknown_P (27) <--[1..44]--> unknown_Q (12) <--[10..82]--> unknown_R (13) <--[10..106]--> unknown_S (18) <--[1..11]--> unknown_T (32) <--[2..5]--> unknown_U (12) <--[0..1]--> unknown_V (32) <--[7..18]--> unknown_W (13) <--[3..8]--> unknown_X (87) <--[0..1]--> unknown_Y (12) <--[2..33]--> unknown_Z (40) <--[0..11]--> unknown_AA (16) <--[3..30]--> unknown_AB (19) <--[8..47]--> unknown_AD (23) <--[0..1]--> unknown_AE (13) <--[0..38]--
# anidulans version. Using default transition matrix.
# Looks like /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 3801, name = example) -----
#
# Predicted genes for sequence number 1 on both strands
# start gene g1
example AUGUSTUS gene 788 3077 0.88 + . g1
example AUGUSTUS transcript 788 3077 0.88 + . g1.t1
example AUGUSTUS start_codon 788 790 . + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 788 996 1 + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 1049 3077 0.88 + 1 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS stop_codon 3075 3077 . + 0 transcript_id "g1.t1"; gene_id "g1";
# protein sequence = [MDISDLIEPPQKRLKTEDISSADEVVLPAGGITPQTDNEIDEQLSKEIEVGITEFVSADNEGFAGILKKRYTDFLVNE
# ILPSGKVLHLTNTTAPNTNDEATPVQADKKPAEDKPKEPETPAEKLPAPVEFQLAEEDEALLDTLFGTQNTKKIVALHKKALANPKTKPSDLGRLNTV
# VVNDRDQRIKMHQAIRRIFNSQIESSTDSEGMMVISVAANRNKKNPQGGGGGRERPRVNWDELGGQYLHFTIYKENKDTMEVISFIARQLKMNPKSFQ
# FAGTKDRRGVTVQRACAYRLQADRLAKLNRTLRNAVVGDFEYQPHGLELGDLYGNEFVVTLRECEVPGINIQDPASAVAKTKELVNTSLKNLYQRGYF
# NYYGLQRFGSFATRTDTVGVKILQDDFKGACDAILDYSPHILAAAQAELGQGEGEGATPTNISSEDKARALAIHIFRTTDRVTDALEKMPRKFSAESN
# IIRHLGRSKNDYLGALQTIPRNLRLMYVHAYQSLVWNLAVGERWRLYGDRVVEGDLVLIHEHRDKDGNSSYTTPAPGAGASGETTTIDADGEIIIVPQ
# EHDSAFAVEDTFTRARALTAAEANSGLYSIFDIVLPLPGFDVLYPPNKMTDFYKEFMGSSRGGGLDPFNMRRKWKDASLSGSYRKVLSRMGRDYSVDV
# VLYSRDEEQFVRTDLENLTLKTRDGGDVDLEKKEGKSEGDKLAVVLKFQLGSSQYATMALRELMRGKVKAYKPDFGGGR]
# end gene g1
###
# command line:
# augustus --species=anidulans --proteinprofile=/services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/EOG092C0B3U.prfl /services/tools/funannotate/1.8.3/lib/python3.6/site-packages/funannotate/config/busco_test.fa
The unicode error is seriously this one line S. König,
-- the umlaut in the name was a problem with py2.7. But I can't reproduce the unicode error locally with any version of python 3. I'm installing on centOS at the moment, let me see if I can get similar behavior.
Awesome thanks for helping me out :)
I'm unable to reproduce this on centOS 7 with python 3.7.8 and Augustus 3.3.3
$ lsb_release -d
Description: CentOS Linux release 7.9.2009 (Core)
Created conda environmnet with mamba (faster solver than conda):
mamba create -n funannotate funannotate
$ conda activate funannotate
$ which python
/apps/miniconda3/envs/funannotate/bin/python
$ python
Python 3.7.8 | packaged by conda-forge | (default, Nov 17 2020, 23:42:15)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import funannotate.library as lib
>>> lib.checkAugustusFunc()
('AUGUSTUS (3.3.3)', True)
>>>
Could the different python version be the issue? v3.6.10
vs v3.7.8
I'm not seeing that either, this was a quick way to check:
mamba create -n py3610 "python==3.6.10" "augustus==3.3.3"
$ conda activate py3610
$ python -m pip install funannotate
$ which python
/apps/miniconda3/envs/py3610/bin/python
(py3610) $ python
Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:42:08)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import funannotate.library as lib
>>> lib.checkAugustusFunc()
('AUGUSTUS (3.3.3)', True)
>>>
How are you loading your environment? Could be something in the order/method that you are activating the environment.
The cluster is uses Environment Modules. I am loading in the following modules when using funannotate:
module load tools perl genemark-es/4.62 signalp/4.1c funannotate/1.8.3
I don't bother loading Augustus as we point Funannotate directory to in on the command line
I'm afraid I'm not going to be much help with that setup as I haven't used it -- @hyphaltip any experience with this setup?
As far as I understand it mostly just sets environment variables when you load in a specific module. I will share a detailed as I can report of my environment when I have access to the cluster again tomorrow.
the order is I usually load the modules and then load the conda env in our module system. This is our module where we do module loads and then conda env load https://github.com/ucr-hpcc/hpcc_modules/blob/master/funannotate/1.8.4
I am just going to try with your conda distribution and see if that solves the issues.
How do we stop funannotate saving what I assume are temporary files to $HOME
?
E.g.:
p2g_1ae9bfde-5527-4b05-8fc2-ce0c723546d3
It assumes you have read/write privileges in the directory in which you launched the script. so that is a temp folder processing the protein2genome alignments.
These are being generated in a directory different to the one I am launching the scripts from I believe.
I am trying to do an annottion without RNA-seq evidence and am running into some issues.
Version
funannotate v1.8.3
*Input'
funannotate predict -i $input -o $outdir --species $species --busco_seed_species $buscoSpecies -d $database --cpus 5
The error