Closed lorenzoamir closed 1 month ago
Hi @lorenzoamir
Thank you for catching them.
regarding the problem with calculateFraction()
: I would change the percentage for the rest of the value to be a fraction. I'm trying to avoid making any changes in the function name, if possible.
regarding the problem with calculatePvalue()
: I would prefer to keep it this way since I'm more interested in the pair of modules that have strong mutual overlap but we can add one input parameter (alternative='two-sided'
) to be able to change for those who prefer another alternative hypothesis.
If you agree with me and want to fix these as I proposed, please let me know and I will wait for you to open a pull request to fix this.
Best, Narges
Hi, I agree with using fractions and not changing any function name.
Regarding the p-values, I said we should use alternative='less'
, but I was wrong, the correct one is actually alternative='greater'
. The problem with alternative='two-sided'
(the current default), is that it will pick up both modules with more overlapping genes than expected (what we want to detect) and modules with less overlapping genes than expected (which don't look particularly interesting to me). I have made a small code example to show this. I have created two pairs of modules, the first one has many overlapping genes, the second one only has one and tested the different alternatives, the ideal outcome is that the first pair should be significant and the second one should not:
Case1: high overlap
two-sided:
p_val: 0.02913752913752914
greater:
p_val: 0.01456876456876457
less:
p_val: 0.9997086247086248
Case1: low overlap
two-sided:
p_val: 0.02913752913752914
greater:
p_val: 0.9997086247086248
less:
p_val: 0.01456876456876457
As you can see alternative='grater'
is the one that only counts modules with high overlap as significant, while alternative='two-sided'
considers both. I think we should use greater, since its probably what the user will expect when calling the function.
okay sounds good! but I still prefer to add this as an input parameter so people can change it.
Hi, I was trying to compare some WGCNA objects and I believe I noticed a few issues in the comparison.
Issues in CalculateFraction:
calculateFraction
, this is a bit confusing.Issues in calculatePvalue:
fisher_exact
is called withalternative='two-sided'
, this means that low p-values are not just obtained for modules with significant overlap, but also for modules with strong mutual exclusivity, for example the matrix in the tutorial only contains $p=0$Proposed fix:
alternative='less
for Fisher's test in CalculatePvalueIf you agree with the proposed fix, I can open a pull request. Just let me know.