How do you validate your results?

Abelsufu commented 4 years ago

Hi, thank you for the ReactomeGSA package, it looks pretty impressive.

I am using it, and although results seem to have sense, I would like to ask you how do you validate them. I just try one pathway, and indeed the expression levels of some genes in that pathway seem to be opposite. So I would appreciate some more explanation about how the method works and how did you validate your results.

Thank you in advance

jgriss commented 4 years ago

Hi @Abelsufu,

Thanks for your interest in ReactomeGSA.

This is expected and a very common case. Essentially, this is the big advantage of gene set enrichment analyses compared to simple overrepresentation analysis (ORA): Here you see these "issues" while in a simple ORA, where you only submit significant gene identifiers, this information is lost.

We did not develop the algorithms ourselves but simply offer a unified interface + the connection to Reactome with existing algorithms.

Depending on which method you used:

PADOG: Tarca, A. L., Draghici, S., Bhatti, G. & Romero, R. Down-weighting overlapping genes improves gene set analysis. BMC Bioinformatics 13, 136 (2012).
Camera: Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40, e133. <URL: http://nar.oxfordjournals.org/content/40/17/e133>

There, you can find the respective validation experiments. Similar to the findings of Tarca et al we also observed that PADOG is more likely to rank relevant pathways at the top.

I hope this answers your question.

Kind regards, Johannes

Abelsufu commented 4 years ago

Thank you so much for your quick answer.

I apologize for my ignorance in advance, but do you think it is possible to change ssGSEA method for PADOG in analyse_sc_clusters function?

Thank you again

Best regards, Abel

On Fri, Jun 12, 2020, 7:52 AM Johannes Griss notifications@github.com wrote:

Hi @Abelsufu https://github.com/Abelsufu,

Thanks for your interest in ReactomeGSA.

This is expected and a very common case. Essentially, this is the big advantage of gene set enrichment analyses compared to simple overrepresentation analysis (ORA): Here you see these "issues" while in a simple ORA, where you only submit significant gene identifiers, this information is lost.

We did not develop the algorithms ourselves but simply offer a unified interface + the connection to Reactome with existing algorithms.

Depending on which method you used:

PADOG: Tarca, A. L., Draghici, S., Bhatti, G. & Romero, R. Down-weighting overlapping genes improves gene set analysis. BMC Bioinformatics 13, 136 (2012).

Camera: Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40, e133. <URL: http://nar.oxfordjournals.org/content/40/17/e133>

There, you can find the respective validation experiments. Similar to the findings of Tarca et al we also observed that PADOG is more likely to rank relevant pathways at the top.

I hope this answers your question.

Kind regards, Johannes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/reactome/ReactomeGSA/issues/11#issuecomment-643230086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAXCULDVFB4ZR4IMFX6KCDRWIJHXANCNFSM4N2SNJVA .

jgriss commented 4 years ago

Hi @Abelsufu,

PADOG and ssGSEA belong to different families of pathway algorithms.

PADOG compares two groups of samples on the pathway level. Essentially, it's a combination of pathway analysis with differential expression analysis.

ssGSEA is a so-called gene set variation analysis. This type of algorithm merges the gene-level expression values on the pathway level. You can then use these pathway-level expression values to perform a qualitative analysis of your cell clusters.

In order to use PADOG for single-cell data you therefore need to have two groups of clusters (at least three clusters per group) to compare against each other. F.e. you could compare three clusters of T cells with three clusters of NK cells.

Is this the type of analysis you are looking for?

Abelsufu commented 4 years ago

Hi Johannes,

Thank you so much for the explanation, as you can see, I am not an expert at all in this topic.

The answer for you question is yes, I would like to compare different clusters involving different populations.

Thank you again,

Abel

On Fri, Jun 12, 2020, 2:15 PM Johannes Griss notifications@github.com wrote:

Hi @Abelsufu https://github.com/Abelsufu,

PADOG and ssGSEA belong to different families of pathway algorithms.

PADOG compares two groups of samples on the pathway level. Essentially, it's a combination of pathway analysis with differential expression analysis.

ssGSEA is a so-called gene set variation analysis. This type of algorithm merges the gene-level expression values on the pathway level. You can then use these pathway-level expression values to perform a qualitative analysis of your cell clusters.

In order to use PADOG for single-cell data you therefore need to have two groups of clusters (at least three clusters per group) to compare against each other. F.e. you could compare three clusters of T cells with three clusters of NK cells.

Is this the type of analysis you are looking for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/reactome/ReactomeGSA/issues/11#issuecomment-643418101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAXCUPM6CA3I5X4TQIXX3DRWJWFFANCNFSM4N2SNJVA .

jgriss commented 4 years ago

Hi @Abelsufu,

In that case I recommend that you create average gene counts per cluster ie pseudo-bulk datasets. Once you have this data you can use the standard ReactomeGSA method to compare different sample groups:

https://bioconductor.org/packages/release/bioc/vignettes/ReactomeGSA/inst/doc/using-reactomegsa.html

Kind regards, Johannes

reactome / ReactomeGSA

How do you validate your results? #11