run_ulm on TF with only positively regulated target genes

ken-chen-18 commented 2 months ago

Hi,

While using decoupler, I noticed that the weights in the collecTRI dataset are either 1 or -1. If a TF only has positively regulated target genes, I'm a little confused as to how the activation score I'm getting are calculated. run_ulm requires fitting a linear model on the relationship between the interaction weights and the gene expression values, but if the interaction weights are all 1, then we'd just have a vertical line. How would calculating the t-value of an infinite slope be meaningful? Is there something I'm missing?

Thanks for your help!

PauBadiaM commented 2 months ago

Hi @ken-chen-18,

Whenever you have a TF that only has ones the model still works since all genes that do not belong to the TF gene set but are included in your mat get a value of 0, therefore you have two clouds of points to fit the regression line. Basically, like in any other gene set enrichment methods, you need a background distribution of genes to compute your enrichment score. Does this make it clearer? BTW thanks for the question, I think I'll update the docs to give a better description.

ken-chen-18 commented 2 months ago

That makes so much more sense, thank you!

saezlab / decoupler-py

run_ulm on TF with only positively regulated target genes #117