Closed andreyurch closed 10 months ago
Hi @andreyurch,
mat
that do not belong to TF X (they have 0s), those are also considered in the linear model. So, if the background is in average still less than 10 after normalization, you still will get a high activity. Alternatively, you could also perform TF activity inference at the contrast level if you have well defined conditions.ulm
and mlm
) outperform other classic methods such as viper
or gsea
:
This was done using our previous benchmark dataset, recently we ran again the pipeline with the KnockTF2 database which contains more perturbation experiments and we saw the same pattern:
If you are interseted in the benchmarking pipeline, you can find more information here
To sum up, my recommendation would be to use ulm
since with mlm
sometimes you can run into co-linearity issues.
Hope this is helpful!
Dear developers, first I would like to thank you for the wonderful set of tools! Second, I have two questions:
I am interested in bulk RNA TF activity inference. You suggest using normalised counts for the linear model fitting. Let's imagine a situation where we have 4 genes, which are targets of TF X: A, B, C, D with the corresponding counts 10, 10, 20, 20 and mode of activation -1,-1, 1, 1. In the linear model with such parameters, we will have a strong activation signature (10,-1)(10,-1)(20, 1)(20,1) of TF X. But what if the length of the genes A, B, C ,D is 1000, 1000, 2000, 2000 respectively? If we convert counts to TPMs/FPKMs, then the expression of the genes will be equal in the cell (ex. 10,10,10,10). Then, the linear model will not indicate any activation of the TF X. There is a possibility that I do not understand something, but it seems that normalisation of the expression on the length of the genes should be a really important for the TF activity inference inside a sample...
Did you compare the performance of VIPER with the linear model fitting? Which method is bettter from your point of view?
Best regards, Andrey