Closed PlantDr430 closed 5 years ago
Yippeee! Always fun when formats change..... is there a tag saying which version of antismash the result is from?
Edit: looks like version in the comment section. So an updated parser will need to be added to the code.
yea, in the .gbk they have this:
Version :: 5.0.0rc1
Run date :: 2019-05-10 16:52:23
but there isn't a tag such as this in the v4 .gbk's
In the example you posted above, it seems that the annotation is not numerically incrementing properly, ie there are two 'protocluster' features, however, they say there are from the same "number". Is this the case throughout the gbk file output? Here are the two "protocluster" features:
protocluster 31439..78329
/aStool="rule-based-clusters"
/contig_edge="False"
/core_location="join{[51438:51715](+), [51814:52199](+),
[52265:52794](+), [52859:57416](+), [57480:58329](+)}"
/cutoff="20000"
/detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
or hyb_KS or itr_KS or tra_KS))"
/neighbourhood="20000"
/product="T1PKS"
/protocluster_number="1"
/tool="antismash"
proto_core join(51439..51715,51815..52199,52266..52794,52860..57416,
57481..58329)
/aStool="rule-based-clusters"
/cutoff="20000"
/detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
or hyb_KS or itr_KS or tra_KS))"
/neighbourhood="20000"
/product="T1PKS"
/protocluster_number="1"
And then is one other one, looks like this:
protocluster 64344..107816
/aStool="rule-based-clusters"
/contig_edge="True"
/core_location="join{[91647:92574](-), [91554:91580](-),
[91368:91464](-), [91070:91264](-), [85323:90989](-),
[85064:85241](-), [84343:84982](-)}"
/cutoff="20000"
/detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
or hyb_KS or itr_KS or tra_KS))"
/neighbourhood="20000"
/product="T1PKS"
/protocluster_number="1"
/tool="antismash"
proto_core complement(join(84344..84982,85065..85241,85324..90989,
91071..91264,91369..91464,91555..91580,91648..92574))
/aStool="rule-based-clusters"
/cutoff="20000"
/detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
or hyb_KS or itr_KS or tra_KS))"
/neighbourhood="20000"
/product="T1PKS"
/protocluster_number="1"
So I'm wondering if this is correct? These two "protocluster" features are part of the same cluster? Or is this a mistake? Does the html output match this? They appear to overlap -- so perhaps underlying code is correct. So does that mean that all "clusters" have this protocluster annotation or is this a subset of the cluster annotation?
Update: I think I figured out what is happening. It seems that the numbering is contig specific, i.e. it starts over counting from 1 for each GenBank record (contig). And then looks like they are now using a contig.num for naming on html.
That would make sense as my results usually only have one cluster per contig. Although, interesting that your run appears to indicate more clusters than mine.
I also noticed that in some other contigs /protocluster_number="1"
appeared with different products such as NRPS-like, or terpenes. Which would indicate that it isn't product related and does appear to be contig related.
protocluster 9399..53295
/aStool="rule-based-clusters"
/contig_edge="False"
/core_location="[29398:33295](-)"
/cutoff="0"
/detection_rule="cds((PP-binding or NAD_binding_4) and
(AMP-binding or A-OX))"
/neighbourhood="20000"
/product="NRPS-like"
/protocluster_number="1"
/tool="antismash"
proto_core complement(29399..33295)
/aStool="rule-based-clusters"
/cutoff="0"
/detection_rule="cds((PP-binding or NAD_binding_4) and
(AMP-binding or A-OX))"
/neighbourhood="20000"
/product="NRPS-like"
/protocluster_number="1"
I didn't use the same genome ;)
Goal is to get this updated today, I'll post here when its working.
Okay, I think I have it fixed, if you wouldn't mind testing the latest commit that would be helpful. Version should be:
$ funannotate version
funannotate v1.6.0-046e957
The parser picked up on clusters and smCOGs, but stated that I don't have any backbone biosynthetic enzymes. While I believe I do have some as antiSMASH is picking up some genes are "core biosynthetic genes".
[03:48 PM]: Now parsing antiSMASH v5 results, finding SM clusters
[03:48 PM]: Found 32 clusters, 0 backbone biosynthetic enyzmes, and 77 smCOGs predicted by antiSMASH
[03:48 PM]: Found 0 duplicated annotations, adding 52,327 valid annotations
[03:48 PM]: Converting to final Genbank format, good luck!
[03:50 PM]: Creating AGP file and corresponding contigs file
[03:50 PM]: Cross referencing SM cluster hits with MIBiG database version 1.3
[03:50 PM]: Creating tab-delimited SM cluster output
[03:50 PM]: Writing genome annotation table.
[03:50 PM]: Funannotate annotate has completed successfully!
We need YOUR help to improve gene names/product descriptions:
0 gene/products names MUST be fixed, see LM461_fun_output/annotate_results/Gene2Products.must-fix.txt
1 gene/product names need to be curated, see LM461_fun_output/annotate_results/Gene2Products.need-curating.txt
60 gene/product names passed but are not in Database, see LM461_fun_output/annotate_results/Gene2Products.new-names-passed.txt
Please consider contributing a PR at https://github.com/nextgenusfs/gene2product
-------------------------------------------------------
stephenwyka@bspmgenomics:/data/wyka$
Ok, thanks. Its not really a big deal/change, I don't think, as it is simply a counter. Do the results in annotate_result/*.cluster.txt
make sense?
Wonder if this is difference in 5.0.0 [what I ran on web server] and 5.0.0rc1 [which seems to be what you have].
Yes, the results in annotate_result/*.cluster.txt
make sense
Thanks, I'll see if I can fix the counter.
Okay, should now be counting the biosynthetic enzymes based on the 'gene_kind' = 'biosynthetic' in the CDS metadata.
Thank you
So this fixed worked on all my genomes except one, where I got this error:
[03:12 PM]: Now parsing antiSMASH v5 results, finding SM clusters
Traceback (most recent call last):
File "/data/wyka/funannotate-master/bin/funannotate-functional.py", line 878, in <module>
lib.ParseAntiSmash(antismash_input, AntiSmashFolder, AntiSmashBed, AntiSmash_annotations) #results in several global dictionaries
File "/data/wyka/funannotate-master/lib/library.py", line 5320, in ParseAntiSmash
numericalContig = int(''.join(filter(str.isdigit, chr)))
UnboundLocalError: local variable 'chr' referenced before assignment
stephenwyka@bspmgenomics:/data/wyka/final_funannotate/Cpur20_1$
Thanks, that one was typo: https://github.com/nextgenusfs/funannotate/commit/0c6732d0f408e66822cc3eea1c159aa6d74ceb9c. git pull should fix it.
Got it to work
I am currently using the latest version which I pulled off of github today (v1.5.3-21ad095).
I am also using the newest version of antiSMASH v5, however, I noticed that the qualifiers in the .gbk output are different than previous .gbk that I had from antiSMASH v4. Perhaps these could be why not clusters or smCOGs are being parsed out.
I have attached my log file and a version of the .gbk results showing a portion of the output of antiSMASH v5. funannotate-annotate.log antiSMASH.results.txt