Closed cthoyt closed 3 years ago
Thanks for your comments. Now in the new version of the code, three alternative pathway data were provided ,including GO, wikipathway, reactome. And the hierarchical clustering of drugs based on different pathway data were provided in the submission of manuscript as figure S2.
On top of including the missing requirement in #8, the perl script should give the option to pick the GO data, KEGG data, or user-provided data.
Most papers only present analysis on one of GO, KEGG, WikiPathways, Reactome, MetaCyc, etc. This could be because it's quite difficult to handle these data sources, it could be scientific negligence, or in the worst case scenario, it could be cherry picking. Either way, there are non-intuitive non-trivial discrepancies between these databases that cause varying results, so it's difficult to asses the scientific validity of results from only a single one.
I personally care about this very much following work after my involvement in this paper where we provided a framework for comparison across the multiple databases and showed that there are indeed many non-trivial places where results differ. Therefore this paper and code should include not only analysis on KEGG, but also the other resources.
It appears that the code is only using gene sets, which are indeed widely available for all of the different pathway databases (as well as integrative gene set databases like MSigDB).
Please refer to this paper about the possible statistical consequences of doing many related statistical tests on overlapping gene sets