statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
237 stars 27 forks source link

Issues with associationTest results with a single lineage #37

Closed skvanburen closed 4 years ago

skvanburen commented 4 years ago

Hi,

I have been running into problems with associationTest results when a single lineage is fit. Specifically, results differ wildly when changing the number of knots from 4 to 5 in the fitGAM statement, with Wald statistics using 5 or more knots being much too large to be logical.

The code to reproduce this issue is given below, with the attached PDF showing summary information on the output Wald statistics with 4 and 5 knots as well as versions of packages I am using.

In particular I am using the most recent version of the package (1.1.17). This issue seems like it might be related to the one here https://github.com/statOmics/tradeSeq/issues/17#issue-503183765 , though it does only seem to happen if the number of knots is greater than 4.

Also note that this only seems to occur when a single lineage is fit by slingshot, I have not been able to reproduce this issue when 2 or more lineages are fit.

Thanks!

library(splatter)
library(tradeSeq)
library(SingleCellExperiment)
library(slingshot)

nGenes <- 60179
numCells <- 100
current_seed <- 328585
params <- newSplatParams()

#Simulate no genes to be DE such that we would expect a small number of rejections
SplatterSimObject <- splatSimulate(params,
                                   method="paths",
                                   nGenes=nGenes,
                                   batchCells=numCells,
                                   seed=current_seed,
                                   lib.loc=11.49293, de.prob = 0, de.facLoc = log(2)*3,
                                   out.prob = 0)

countsT <- counts(SplatterSimObject)

filt_func <- function(x){
  ncells_high_exp <- sum(x >= 10)
  return(ncells_high_exp)
}
rows_to_keep <- apply(countsT, 1, filt_func)
counts <- countsT[rows_to_keep > 10,]

FQnorm <- function(counts){
  rk <- apply(counts,2,rank,ties.method='min')
  counts.sort <- apply(counts,2,sort)
  refdist <- apply(counts.sort,1,median)
  norm <- apply(rk,2,function(r){ refdist[r] })
  rownames(norm) <- rownames(counts)
  return(norm)
}
norm_counts <- FQnorm(counts)

SCEObj <- SingleCellExperiment(assays = List(counts = counts, norm_counts = norm_counts))

pca <- prcomp(t(log1p(assays(SCEObj)$norm_counts)), scale. = FALSE)

rd <- pca$x[,1:2]

reducedDims(SCEObj) <- SimpleList(PCA = rd)

cl <- kmeans(rd, centers = 4)$cluster
colData(SCEObj)$kMeans <- cl

slingshot_results <- slingshot(SCEObj, clusterLabels = 'kMeans', reducedDim = 'PCA')

lin <- getLineages(SCEObj, clusterLabels = colData(slingshot_results)$kMeans, reducedDim = 'PCA')

crv <- SlingshotDataSet(getCurves(lin))

sce4 <- fitGAM(counts = counts, pseudotime = slingPseudotime(crv, na = FALSE),
              sds = crv, nknots = 4, sce = TRUE)

sce5 <- fitGAM(counts = counts, pseudotime = slingPseudotime(crv, na = FALSE),
               sds = crv, nknots = 5, sce = TRUE)

assoc4Knots <- associationTest(sce4)
assoc5Knots <- associationTest(sce5)

SummaryofWaldStatsAndPackageInfo.pdf

koenvandenberge commented 4 years ago

Hi,

Thanks a lot for letting us know and for providing an easy-to-use reproducible example, this has helped a lot in diagnosing the problem, which was a bug on our side. Your results also demonstrated that returning the median logFC may not be great since it is very variable for some genes depending on the number of knots; I have updated this to return the mean logFC.

On my system, the results are no longer variable after the commits I just did. It would be great if you could confirm this by re-installing tradeSeq and re-running your example.

Thanks again, Koen

skvanburen commented 4 years ago

Hi Koen,

Thank you for the very quick reply! I have downloaded the new version (1.1.18) and re-ran the results and no longer see the problems with the Wald statistics. I have also just tried a few other scenarios that had previously given me the same problems and no longer have issues with those either.

Thanks again for the quick fix! Scott