Closed krassowski closed 2 years ago
By the way, a concatenation of DEFINITION and TERM columns might be interesting too. I would expect the most important terms to be repeated in both the term name and definition which could give much better results.
Sorry about the deluge of issues @jokergoo. This is the last one for today, I will keep myself busy with other things now. Please let me know if you would like me help addressing the suggestions I added, or whether you prefer to make decisions and code on your own. In either case - thanks for your awesome work!
That is totally fine, and thank your for all your comments and suggestions! I will look into them in the next few days!
Using gene description/summary to construct word cloud is a great idea! I would like to support it in the package. It seems currently no annotation package provide gene description information (only gene names). Also it seems only RefSeq database provides such information. Then I will manually collect such information.
It seems we need to perform some word analysis if using refseq gene description for word cloud. Some words need to be put into the blacklist (e.g. gene, encode, family, ...)
This is just an idea for an example I guess. Would it be possible to facilitate creating the word cloud based on the descriptions of pathway members (e.g. genes) rather than on the descriptions of the pathway itself?
Currently one can manually adjust the terms (which is great!) and for example use descriptions instead of terms. This is already an interesting alternative which might be worth documenting, as the example heatmap:
Becomes:
when using
AnnotationDbi::select(GO.db::GO.db, keys = x, columns = "DEFINITION")$DEFINITION)
(which is not trivial to change - would it be a good idea to make the TERM/DEFINITION a parameter?)I would like to go a step further and for each pathway concatenate descriptions of all genes that were included; the gene/proteins descriptions could come from RefSeq, Uniprot, or any of the ontology databases. My expectation would be that those are provided by an advanced user in a form of a named character vector, e.g.:
And
simplifyEnrichment
would take the responsibility of concatenating them creating one document per pathway. This could be just a helper function exposed to the users, and the user would need to pass the result asterm
argument toanno_word_cloud
. A special case could be made foranno_word_cloud_from_GO()
where this would be handled for the user if they ask for it.