Open eggrandio opened 2 years ago
Sorry I missed this issue!
Using representative terms for each cluster is of course a way to summarize the "common functions" in a cluster, but it is always a question that whether the selected terms can represent the whole cluster.
You can find in the following Google doc, there are four ways to summarize the general functions in GO clusters:
https://docs.google.com/document/d/1xCKE2rtGHgRH6yp3JuiRrlgY0GpNKcqdiTFnHFlG1JE/edit?usp=sharing
At least for me, using keywords seems to give more general summaries.
But anyway, once you have clustered the GO terms, you can always manually create such "text box" annotation and put your "representative terms" there.
Hello,
I am also interested in assigning a representative GO term to each cluster. I found the word file you sent very informative, however, how can I choose way of selecting the representative term? (you didn't attach the code or guide on the word document).
Thanks in advance
@SenselessN , We think selecting one or a few representative GO terms is not a proper method and it is not enough to summarize the general functions in a GO cluster, and this is the motivation of developing such "word cloud" representation.
Hi jokergoo, thank you so much for developing such useful R packages!
I have been using semantic similarity to cluster enriched GO terms before but this package makes it much easier. I usually try to assign a "representative GO term" to each cluster, and I think in some cases it could be more helpful than a word cloud. Let me know what you think and if you have any suggestion on how to improve the assignment (and feel free to include it in the package if you want).
My idea is first to cluster GO terms by semantic similarity, then find common ancestral GO terms for each cluster, and retrieve a common ancestral term with high Information Content for each cluster (otherwise, very generic terms are returned). The tricky part is how to select a term that is both informative and shared by the majority of the GO terms in the cluster. I have calculated an very simple "importance" term that is
n * IC^2
where n = number of times the ancestral GO appears in each GO cluster and IC = information content for that term. The "importance" calculation could probably be improved!Best,