heiniglab / scPower

Experimental design framework for scRNAseq population studies (eQTL and DE)
45 stars 5 forks source link

effect sizes vs power #28

Open coralzhang opened 1 month ago

coralzhang commented 1 month ago

I am trying to understand the power calculation function power.general.withDoublets. I created 3 reference studies, with low, medium and high fold change values; I expect the power to increase with every setting being fixed. But the power provided was lowest for the medium fold changes. Can you explain why? ` dat=scPower::de.ref.study dat2=dat[dat$name=="Blueprint (CLL) uCLL-iCLL",] dat2=dat2[order(dat2$FoldChange),]

dat3=dat2[11:15,]; dat3$rank=10001:10005 dat4=dat2[121:125,]; dat4$rank=10001:10005 dat5=dat2[331:335,]; dat5$rank=10001:10005

power3<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3, type="de", ref.study=dat3, ref.study.name="Blueprint (CLL) uCLL-iCLL", samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",], gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells", disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1, min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni", multipletRateGrowth="constant")

power4<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3, type="de", ref.study=dat4, ref.study.name="Blueprint (CLL) uCLL-iCLL", samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",], gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells", disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1, min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni", multipletRateGrowth="constant")

power5<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3, type="de", ref.study=dat5, ref.study.name="Blueprint (CLL) uCLL-iCLL", samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",], gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells", disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1, min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni", multipletRateGrowth="constant")

`

here is the output: power3: name powerDetect exp.probs power sampleSize 1 Blueprint (CLL) uCLL-iCLL 0.9889792 0.9900965 0.9988715 10 totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet 1 7000 7000 7.67e-06 2100 25000 25000 mappedReadDepth expressedGenes 1 25000 11210

power4: name powerDetect exp.probs power sampleSize 1 Blueprint (CLL) uCLL-iCLL 0.501316 0.9900965 0.5063288 10 totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet 1 7000 7000 7.67e-06 2100 25000 25000 mappedReadDepth expressedGenes 1 25000 11210

power5: name powerDetect exp.probs power sampleSize totalCells 1 Blueprint (CLL) uCLL-iCLL 0.9900965 0.9900965 1 10 7000 usableCells multipletFraction ctCells readDepth readDepthSinglet 1 7000 7.67e-06 2100 25000 25000 mappedReadDepth expressedGenes 1 25000 11210

KatharinaSchmid commented 1 month ago

Hey, thanks for your question and your interest in our tool :) The reason for this behavior is the meaning behind the FoldChange: a FoldChange of 1 means no difference between the two conditions (expression_1/expression_2), and in the end the deviation from 1 is influencing the power. In your example, the FoldChanges in dat4 are the closest to 1, so the most difficult to detect, resulting in lower power.

It is much better visible when converting the FoldChanges to LogFoldChanges:

> dat2$LogFoldChange<-log2(dat2$FoldChange)
> dat3=dat2[11:15,]; dat3$rank=10001:10005
> dat4=dat2[121:125,]; dat4$rank=10001:10005
> dat5=dat2[331:335,]; dat5$rank=10001:10005
> dat3
                         name FoldChange  rank LogFoldChange
552 Blueprint (CLL) uCLL-iCLL 0.08987483 10001     -3.475939
362 Blueprint (CLL) uCLL-iCLL 0.09053963 10002     -3.465307
516 Blueprint (CLL) uCLL-iCLL 0.09447516 10003     -3.403921
484 Blueprint (CLL) uCLL-iCLL 0.09537010 10004     -3.390319
507 Blueprint (CLL) uCLL-iCLL 0.09725035 10005     -3.362153
> dat4
                         name FoldChange  rank LogFoldChange
433 Blueprint (CLL) uCLL-iCLL  0.3523094 10001     -1.505085
550 Blueprint (CLL) uCLL-iCLL  0.3585967 10002     -1.479566
584 Blueprint (CLL) uCLL-iCLL  0.3601297 10003     -1.473412
617 Blueprint (CLL) uCLL-iCLL  0.3607892 10004     -1.470772
454 Blueprint (CLL) uCLL-iCLL  0.3720297 10005     -1.426510
> dat5
                         name FoldChange  rank LogFoldChange
407 Blueprint (CLL) uCLL-iCLL   18.18620 10001      4.184772
388 Blueprint (CLL) uCLL-iCLL   20.49132 10002      4.356941
549 Blueprint (CLL) uCLL-iCLL   20.49420 10003      4.357143
442 Blueprint (CLL) uCLL-iCLL   23.92357 10004      4.580361
528 Blueprint (CLL) uCLL-iCLL   27.34854 10005      4.773392

Now, it is visible that dat4 has the lowest absolute LogFoldChanges, i.e. gets the lowest power.

I hope this explanation was helpful, please let me know if you have more questions.

Best regards, Katharina

coralzhang commented 1 month ago

Thank you for the response!

coralzhang commented 1 month ago

I have a follow up question: the nSamples in this function, is it total sample size, or per group? Thanks!