saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
176 stars 23 forks source link

TF analysis between conditions scRNA #118

Closed kiwipeel closed 2 months ago

kiwipeel commented 4 months ago

How can I determine if the TF activity between two conditions in my single cell dataset is significant for a gene?

PauBadiaM commented 4 months ago

Hi @kiwipeel ,

You can perform differential expression analysis at the pseudobulk levels between conditions and then use the obtained contrast level gene statistics as input for decoupler. You have an example of this workflow in this vignette. It is in python but should be relatively easy to reproduce in R if that is a limitation. Hope this is helpful!

kiwipeel commented 4 months ago

Hi @kiwipeel ,

You can perform differential expression analysis at the pseudobulk levels between conditions and then use the obtained contrast level gene statistics as input for decoupler. You have an example of this workflow in this vignette. It is in python but should be relatively easy to reproduce in R if that is a limitation. Hope this is helpful!

Why can't I just apply statistical tests to the score values generated from the ULM model? Thank you in advance

PauBadiaM commented 4 months ago

Hi @kiwipeel ,

You could also do that, but if the objective is to compare conditions I would recommend to go the pseudobulk route since with it you do not overinflate the p-values by considering single-cells as true replicates (which are not).

kiwipeel commented 4 months ago

Hi @kiwipeel ,

You could also do that, but if the objective is to compare conditions I would recommend to go the pseudobulk route since with it you do not overinflate the p-values by considering single-cells as true replicates (which are not).

Thank you. Do the p-values in the run_ulm results represent the significance of the scores for each cell and transcription factor, am I right? Why do we create a new assay from all of these scores while there are scores that don't have significant p-values?

PauBadiaM commented 4 months ago

Hi @kiwipeel ,

Indeed! We keep all of them since p-value thresholding is completely arbitrary, depending on the application you might want to use a more strict or relax threshold.

kiwipeel commented 4 months ago

Hi @kiwipeel ,

Indeed! We keep all of them since p-value thresholding is completely arbitrary, depending on the application you might want to use a more strict or relax threshold.

Thank you. However, if I put a threshold on the p-value, it implies that there will be missing values in the new assay we generate from the tf scores. What is the correct way to handle this?"

PauBadiaM commented 4 months ago

It really depends on the downstream task you want to use them for, in your case since you are interested in contrasting conditions I would again recommend to do it at the pseudobulk level, there the filtering by p-value is going to be easier to handle since you obtain a single vector of changes of activities that may or may not be significant.

kiwipeel commented 4 months ago

It really depends on the downstream task you want to use them for, in your case since you are interested in contrasting conditions I would again recommend to do it at the pseudobulk level, there the filtering by p-value is going to be easier to handle since you obtain a single vector of changes of activities that may or may not be significant.

Thank you again. One last question.. After getting TF assay from model , should I use ScaleData() function on tf assay by using split.by argument based on my conditions ?

PauBadiaM commented 4 months ago

Hi @kiwipeel , if it is just for plotting yes, I am not sure about the split.by argument though, you would want to see the differences between your conditions instead no?