Open lukaskon opened 10 months ago
Hi 0- this would really be an antismash question, this tool is just parsing the results. I would encourage you to look at the html report that is in the antismash result folder you would have provided to funannotate.
I believe the answer from your report there are 54 SM clusters instead of the 45 perhaps in previous version of genome. I know Botrytris genome has been improved too, but it may also be that some of the 54 are not in the SM types that they decided to report, but antismash has updated a lot in v6 over v4.
from antismash paper "Secondary Metabolite specific Clusters of Orthologous Groups (smCoGs)" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125804/
Finally, from the genes within this database of gene clusters, we constructed secondary metabolism Clusters of Orthologous Groups (smCOGs). These are used in yet another module to predict and categorize the functions of accessory genes, and to calculate phylogenetic trees for each gene with a seed alignment of its smCOG protein family. Our benchmark results show that our method reliably detects gene clusters of a wide variety of biosynthetic types, and that it is able to significantly enhance manual genome annotations of secondary metabolite biosynthesis.
@nextgenusfs will perhaps have additional answers.
Thank you for the quick response. I looked at the html from the antismash v6 results, and it also reports 44 regions (aka clusters, per their definition). I am confused why this is different from what funannotate reports (54); Is funannotate using different criteria to define an SM cluster? Thanks again.
I believe it is indeed parsing the protoclusters
, which was what the clusters were called in v4. But now I think this is outdated with the "regions" idea that versions v5 and greater use. I've recently run some genomes through v7 and my personal feeling is that antiSMASH is being overly conservative with what it calls a "cluster region" and there are some new fungal-RIPP-like models (not sure if these are very accurate at all...). But the antiSMASH parsing code should likely be fixed in v6 and v7 to identify "regions" instead of protoclusters. I don't know when I'll have time to look at this.....
Great, I think we are on the same page. I also noticed the same thing when trying antismash v7. If this is of any use, here is the range reported for clusters identified in the same 7 isolates of Botrytis cinerea for each method.
Funannotate-annotate log (using antismash v6 input): 52-61 Command line antismash v6 (see command used above): 42-52 Browser antismash v7: 20-24 (including the B05.10 reference genome)
Are you using the latest release? 1.8.15
Describe the bug Can you please explain the difference between "clusters" and "smCOGs" parsed by antismash? Are "clusters" actually protoclusters? I could not find any details in the documentation, but maybe I missed it.
All the numbers I found in the log file are significantly higher that what was expected based on my organism's reference genome (Botrytis cinerea has approx. 45 key genes encoding biosynthetic enzymes for SM synthesis), and I am wondering if this is because I am interpreting them wrong, database updates since 2012, differences in repeat masking, etc. Any explanation or advice is appreciated!
What command did you issue?
Antismash direct results
When I looked at the number of .gbk files output from antismash, this is closer to what I would have expected. According to antismash, "A region in antiSMASH 5 and above corresponds to the gene cluster annotation in antiSMASH 4 and earlier.", so I would have thought this number would be the cluster number?
Logfiles funannotate-annotate.log
OS/Install Information
funannotate versions.txt