Closed AmelZulji closed 5 days ago
Hi @AmelZulji,
Here I assume estimates are equivalent to wights in CollecTRI. However, I am not sure whether I min-max-scale scale them so that the estimates are in [-1,1] range?
Indeed, you can use these estimates as weights. However, through different benchmarks that we have done we have observed that the magnitude of the weights do not provide much information but rather their direction (sign). For this reason, I'd recommend to use as weight just the sign of pando's estimates. You can do:
import numpy as np
modules['weight'] = np.sign(modules['estimate'])
Here decoupler runs with 291 TFs. But the provided models by pando have 660 TFs. What causes that? is it possible that (all) targets of TFs are dropped while filtering empty features from mat?
It can be two things. First, that many target genes are empty, these get automatically removed by decoupler
. The other is that after filtering empty features, many TFs have less than 5 target genes, which decoupler
ignores. This is controlled by the parameter min_n=5
, which you could set to a lower value, but I would not go lower than 3 (if it is only 2 genes, is it really a gene set?). This min_n
is important to make sure that the obtained activities are robust and not just driven by just 1 or 2 genes.
Hope this is helpful! Let me know if you have more questions ;)
Thank you for the quick reply, @PauBadiaM!
For this reason, I'd recommend to use as weight just the sign of pando's estimates.
Will I lose information/interpretability in that way since estimates are dataset specific?
In addition, once the scores are computed, would you greenlight using them as another assay/layer to compute differences between TF/regulon activity lets say between 2 conditions? If yes, do you have suggestions which way to go (You mentioned in previous questions that you had to reimplement scanpy's functionality for DEA)?
Will I lose information/interpretability in that way since estimates are dataset specific?
Not really since the most "valuable" specific information is the specific TF-G interaction and its mode of regulation (just the sign).
In addition, once the scores are computed, would you greenlight using them as another assay/layer to compute differences between TF/regulon activity lets say between 2 conditions? If yes, do you have suggestions which way to go (You mentioned in previous questions that you had to reimplement scanpy's functionality for DEA)?
You can compute activities at the observation level (for each single-cell) and then apply statistics on it using for example the decoupler function rank_sources_groups
.
This works well for "marker" extraction, one cell type vs the rest but not so much if you want to compare conditions since p-values will be overinflated. This is because each cell is treated as a sample. We know that single cells within a sample are not independent of each other, since they were isolated from the same environment. If we treat cells as samples, we are not testing the variation across a population of samples, rather the variation inside an individual one. Moreover, if a sample has more cells than another it might bias the results.
For this reason, I would recommend generating pseudobulk profiles for each sample and cell type, compute differential expression analysis and infer TF activities from the obtained contrast statistics. You have a detailed vignette showcasing how to do this here.
Let me know how it goes!
Thank you for the detailed clarification, Pau!
It works as expected.
Regards
Hi,
I have few questions related to using GRN inferred by Pando on 10x multiome data.
mat
?Thank you and kind regards, Amel