heiniglab / scPower

Experimental design framework for scRNAseq population studies (eQTL and DE)
45 stars 5 forks source link

FDR in budget optimization #9

Closed matthiasheinig closed 2 years ago

matthiasheinig commented 2 years ago

I have an issue when trying to optimize the design for a sampled reference data set and choosing FDR as the MT method. When I use Bonferroni it works. Here is the error message:

Error in uniroot(f = fdr.optimization, interval = c(lowerBound, sign.threshold),  : 
  f() values at end points not of opposite sign

And here is the full code

library(scPower)
library(ggpubr)
library(tidyverse)

set.seed(0)
N <- 25000
ndiff <- 100
sd <- 0.1
de.scenario <- data.frame(name="high_expr_fc1", FoldChange=2^rnorm(N, sd=sd), DE=FALSE, rank=1:N)
de.scenario$DE[sample(1:1000, ndiff)] <- TRUE
de.scenario$FoldChange[de.scenario$DE] <- 2^rnorm(ndiff, 1, sd=sd)
de.scenario$cumFraction <- cumsum(de.scenario$DE) / ndiff

p1 <- ggplot(aes(x=log2(FoldChange), y=..density.., col=DE), data=de.scenario) + geom_histogram()
p2 <- ggplot(aes(x=rank, cumFraction), data=de.scenario) + geom_line() + geom_abline(slope=1/N, intercept=0)

ggarrange(p1, p2)

## costs
totalBudget = 75000
costKit = 6000 ## 10X 3'seq
costFlowCell = 14000 ## Novaseq 6000 S4 flowcell
readsPerFlowcell = 4100000000 ## Novaseq 6000 S4 flowcell

best <- optimize.constant.budget(
  totalBudget=totalBudget,
  type="de",
  ct="CD8 T cells",
  ct.freq=0.25,
  costKit=costKit,
  costFlowCell=costFlowCell,
  readsPerFlowcell=readsPerFlowcell,
  ref.study=de.scenario,
  ref.study.name="high_expr_fc1",
  samplesPerLane=8,
  read.umi.fit=filter(read.umi.fit, type == "10X_PBMC_1"),
  gamma.mixed.fits=gamma.mixed.fits,
  disp.fun.param=disp.fun.param,
  nSamplesRange = (16:24) * 10,
  nCellsRange = (10:40) * 100,
  readDepthRange = NULL,
  mappingEfficiency = 0.8,
  multipletRate = 7.67e-06,
  multipletFactor = 1.82,
  min.UMI.counts = 3,
  perc.indiv.expr = 0.5,
  cutoffVersion = "absolute",
  nGenes = 21000,
  samplingMethod = "quantiles",
  multipletRateGrowth = "linear",
  sign.threshold = 0.05,
  MTmethod = "FDR",
  useSimulatedPower = FALSE,
  simThreshold = 4,
  speedPowerCalc = FALSE,
  indepSNPs = 10,
  ssize.ratio.de = 1
)
KatharinaSchmid commented 2 years ago

In general, the parameter ref.study expects only DE genes in all rows. So the simulated data.frame from the example code needs to be filtered, then the function runs without any problems:

de.scenario<-de.scenario[de.scenario$DE,]

Mathematically, FDR correction can not be performed if all genes are DE genes, this causes the function to fail (division by 0). The code is updated now to have a better error message for this, when all expressed genes are defined as DE genes (package version 1.0.2).