Closed AnotherSimon closed 4 years ago
What version of diamond? https://github.com/nextgenusfs/funannotate/issues/135
I think that UniProt changed their format for FASTA deflines..... which offset the name/description parser. working on a fix now.
Think the fix works. However I got 0 UniProt hits, is that suspicious?
[09:59:05 AM]: Running Diamond blastp search of UniProt DB version 2018_02 [09:59:09 AM]: 0 valid gene/product annotations from 882 total [09:59:11 AM]: Running Eggnog-mapper ...
Yeah that doesn’t seem right. There should be fewer than the total but not zero.
I seem to remember UniProt being slightly larger than 882 curated genes. Or is this a particular subset defined by some filtering criteria?
Yeah, so 882 hits that are > 60% identical and over 60% of the length of the protein, and then they are further filtered for which hits have "proper" gene names and descriptions. Some are not curated very well and don't have a gene name and aren't useful. But more stuff should be passing here, anyway to send me the uniprot.xml in your annotate_misc folder?
Our IT is pretty strict on outward facing sites so I'll send to you by your gmail address.
Hmmm, this is the default database installed by funannotate correct? I guess then it must mean that older versions of are doing something different with the deflines and it isn't being parsed correctly.
Here is what your data looks like for a hit:
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>diamond 0.8.22</BlastOutput_version>
<BlastOutput_reference>Benjamin Buchfink, Xie Chao, and Daniel Huson (2015), "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12:59-60.</BlastOutput_reference>
<BlastOutput_db></BlastOutput_db>
...
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gnl|BL_ORD_ID|421135</Hit_id>
<Hit_def>sp|O13882|RT18_SCHPO</Hit_def>
<Hit_accession>421135</Hit_accession>
<Hit_len>223</Hit_len>
This is what the script is expecting.
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>diamond 0.9.14</BlastOutput_version>
<BlastOutput_reference>Benjamin Buchfink, Xie Chao, and Daniel Huson (2015), "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12:59-60.</BlastOutput_reference>
<BlastOutput_db>/usr/local/share/funannotate/uniprot.dmnd</BlastOutput_db>
...
Hit>
<Hit_num>1</Hit_num>
<Hit_id>sp|Q2GWZ4|CFD1_CHAGB</Hit_id>
<Hit_def>Cytosolic Fe-S cluster assembly factor CFD1 OS=Chaetomium globosum (strain ATCC 6205 / CBS 148.51 / DSM 1962 / NBRC 6347 / NRRL 1970) OX=306901 GN=CFD1 PE=3 SV=1</Hit_def>
<Hit_accession>Q2GWZ4</Hit_accession>
<Hit_len>303</Hit_len>
I think that diamond databases < v0.8 are not compatible with v0.9 and greater. So it may fix it to upgrade diamond to a newer version (although there are several recent versions where the XML format is broken see #135), but you will also have to re-run funannotate setup
to generate the updated diamond databases. Otherwise I will have to install an older version here locally and see if that yields the same results.
I'm a little bit surprised that the hit-ids say gnl
instead of the sp
prefix --> must be something hard coded in older version of diamond.
Updated diamond to 0.9.18, manually deleted all files in $FUNANNOTATE_DB, ran funannotate setup
again. (funnanotate v1.1.1) Also had to download eggnog databases again with -f
option because they were incompatible with newer diamond version.
UniProt seems to pass muster now. However I observed some strange behavior where the results file from remote Phobius seems to disappear when calling funannotate annotate
. Trying to reproduce this behavior now.
My phobius results are definitely getting deleted by funannotate annotate
. I keep a renamed copy as a work-around in the mean while. Here's the log:
2018-03-05 10:51:34,208: Running Eggnog-mapper 2018-03-05 10:51:34,208: emapper.py -m diamond -i .../My_bug/annotate_misc/genome.proteins.fasta -o eggnog --cpu 24 2018-03-05 10:51:35,982: # emapper-1.0.3 ./emapper.py -m diamond -i .../My_bug/annotate_misc/genome.proteins.fasta -o eggnog --cpu 24 [1;33m /home/simon/bin/diamond blastp -d /home/simon/software/eggnog-mapper/data/eggnog_proteins.dmnd -q .../My_bug/annotate_misc/genome.proteins.fasta --more-sensitive --threads 24 -e 0.001000 -o .../My_bug/annotate_misc/emappertmp_dmdn_d2lDJp/36dd9d8b68244edb9a53c02bca1b740e --top 3 [0m 2018-03-05 10:51:35,983: Error: Database was built with a different version of Diamond as is incompatible. Traceback (most recent call last): File "/home/simon/software/eggnog-mapper/emapper.py", line 1001, in
main(args) File "/home/simon/software/eggnog-mapper/emapper.py", line 216, in main dump_diamond_matches(args.input, seed_orthologs_file, args) File "/home/simon/software/eggnog-mapper/emapper.py", line 353, in dump_diamond_matches raise e subprocess.CalledProcessError: Command '/home/simon/bin/diamond blastp -d /home/simon/software/eggnog-mapper/data/eggnog_proteins.dmnd -q .../My_bug/annotate_misc/genome.proteins.fasta --more-sensitive --threads 24 -e 0.001000 -o .../My_bug/annotate_misc/emappertmp_dmdn_d2lDJp/36dd9d8b68244edb9a53c02bca1b740e --top 3' returned non-zero exit status 1 2018-03-05 10:51:35,984: No Eggnog-mapper results found. 2018-03-05 10:51:35,984: Combining UniProt/EggNog gene and product names using Gene2Product version 1.4 2018-03-05 10:51:36,298: 653 gene name and product description annotations added 2018-03-05 10:51:36,298: Running Diamond blastp search of MEROPS version 12.0 2018-03-05 10:51:36,322: 282 annotations added 2018-03-05 10:51:36,323: Annotating CAZYmes using HMMer search of dbCAN version 6.0 2018-03-05 10:51:36,325: 206 annotations added 2018-03-05 10:51:36,325: Annotating proteins with BUSCO dikarya models 2018-03-05 10:51:36,345: 1,841 annotations added
And the StdOut:
[92m[10:51:36 AM][0m: Running Diamond blastp search of MEROPS version 12.0 [92m[10:51:36 AM][0m: 282 annotations added [92m[10:51:36 AM][0m: Annotating CAZYmes using HMMer search of dbCAN version 6.0 [92m[10:51:36 AM][0m: 206 annotations added [92m[10:51:36 AM][0m: Annotating proteins with BUSCO dikarya models [92m[10:51:36 AM][0m: 1,841 annotations added
Traceback (most recent call last): File "/home/simon/software/funannotate/bin/funannotate-functional.py", line 767, in
shutil.copyfile(args.phobius, phobius_out) File "/home/ppa/software/lib/python2.7/shutil.py", line 82, in copyfile with open(src, 'rb') as fsrc: IOError: [Errno 2] No such file or directory: './My_bug/annotate_misc/phobius.results.txt'
So it's partially my fault for not properly reinstalling eggnog databases but I don't see how that should be related to the phobius results getting deleted.
Small update: to get eggnogg working, I had to extract all the fasta sequences from ~/software/eggnog-mapper/data/eggnog_proteins.dmnd
with the diamond distribution bundled with eggnogg mapper and then turn it back into a dmnd file with the shiny new diamond v0.9.18 in my PATH. Might be worth mentioning in the install guide that eggnog-diamond versioning can be an issue.
Small update 2: It appears that this error is not unique to phobius but rather all 3 of the remote search results files. The error seems to stem from storing them in the ./My_bug/annotate_misc
folder where they are overwritten by the funannotate annotate
command. So either the results need to be moved out of this subfolder by the user after funannote remote
or an update of the script is in order.
When running the command:
funannotate annotate --input My_bug --sbt template.sbt \
--antismash ./My_bug/annotate_misc/antiSMASH.results.gbk \
--iprscan ./My_bug/annotate_misc/iprscan.xml \
--phobius ./My_bug/annotate_misc/phobius.results.txt \
--cpus 24
The following error occurs:For completeness I should mention that the remote annotation finished fine in all three cases but there was an issue in writing the log files for antiSMASH and IPRscan respectively: