Open sam-israel opened 3 years ago
Hi sam-israel, Thanks for the questions. 1, The csDE function in TOAST uses linear model without additional constraints on the parameters. Unfortunately the negative mu estimation is unavoidable. Currently I don't have a procedure for correcting these negative base-line expressions. Sorry about that.
Hi, Thank you for the answers.
A possible way of making the multiple comparisons correction (fdr) less strict is by pre-filtering the number genes TOAST is applied to. If I pre-filter the genes (based upon their average TPM) the total number of the genes will be less, and the fdr correction will correct the p-values less strictly. My question is if this is recommendable from the de-convolution point of view. Will TOAST operate with maximum efficiency if receiving all (human) genes as input? If pre-filtering recommended ? Is there an minimal recommended number of genes?
Is calculating fold change as (μ+β)/μ correct? For what purposes the effect_size should be used rather than the fold change?
What would be a good way of calculating some measure of fold change when μ is negative?
I want to see if there is an enrichment in the results, via inputting the genes into a software such as GSEA. For that I need to order by pvalue, and by a measure of change.
Would (|μ|+β)/|μ| work?
In my dataset
summary(myres$MAIT$mu)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-12505.83 0.04 5.05 65.54 24.45 131194.07
Hence both fold change (calculated manually) and effect_size can give negative values.
summary(myres$MAIT$effect_size)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-4454.514 0.042 1.031 0.698 1.788 2235.051
myres$MAIT$foldchange <- (myres$MAIT$mu+myres$MAIT$beta)/myres$MAIT$mu
summary(myres$MAIT$foldchange)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-99479.19 -0.26 1.30 -5.00 3.81 9113.20
Hi sam-israel,
For "My question is if this is recommendable from the de-convolution point of view. Will TOAST operate with maximum efficiency if receiving all (human) genes as input? If pre-filtering recommended ? Is there an minimal recommended number of genes?" Yes, we recommend performing some pre-filtering on the data to remove genes with low expressions. We haven't explored how that impacts the DE results thus no minimal recommended number is available. Previously we have used some ad-hoc approach, like filtering out the genes with mean expression < 2. You can explore using different filtering thresholds.
For the question about fold change, the purpose of our effect size calculation is to provide measure similar to folder change. Let me explain why we choose β/(µ + β/2). Think of two genes A and B. In the two conditions (non-diseased and diseased), their expressions in the same cell type are 10 and 110 (gene A), 10000 and 10100 (gene B). Then β is 100 for both gene A and B, but µ are 10 and 10000 for A and B, respectively. (μ+β)/μ will give 11 and 1.01, β/(µ + β/2) will give 1.67 and 0.00995. Both rank gene A higher than gene B. However, when μ is very small, e.g. μ = 0.001, (μ+β)/μ is not as stable as β/(µ + β/2). I am hesitated to interpret the negative values as those may be resulted from an improper model fit... Maybe those are something we could work on in the future.
Hope this helps.
Thank you.
Hi sam-israel,
These are great questions. Honestly, there is a need for research and evaluation toward these directions. What we currently do is not using QC approaches, but to communicate with our biological collaborators and seek their opinion to understand whether the results make sense. Another thing we find is that, the quality of the findings is highly correlated with the cell type abundance. The cell type DEs identified for cell types with proportions ~ 0.4 are much more reliable than a rare cell type with proportion ~ 0.05. For very rare cell type with proportion < 0.05, it is likely that the majority csDE findings are false positives. But we didn't quantitively evaluate these so far. Hope this helps.
Can you confirm that? If that is correct, then how come µ can be negative? Even in the vignette:
What procedure would you suggest for genes with a negative base-line expression?
When not specifying the contrast for csTest, which way the comparison is made? It is crucial not to be mistaken about this. For example in
Is the comparison Yes - No, or No - Yes ?
If I want to filter only for genes with a decent effect size, what filtering would you suggest? Is it recommended to calculate the fold change as (μ+β)/μ and filter fold change > 2 ?
Could you explain what "testing the joint effect in all cell types" means? Such as in :