DE on SCTransformed and Integrated data

rleonriv commented 4 years ago

Hi,

Thank you for making this great program and method available! I would like to know the code or an idea of how to do differential expression testing using SCTransform on Integrated samples as described in the preprint https://www.biorxiv.org/content/biorxiv/early/2019/03/18/576827.full.pdf.

"4.9 Differential expression testing Differential expression testing was done using independent t- tests per gene for all genes detected in at least 5 cells in at least one of the two groups being compared. P-values were adjusted for multiple comparisons using the Benjamini & Hochberg method (FDR). Input to the test was either log-normalized (log(10, 000UMIgene/UMIcell + 1)) expression or Pearson resid- uals after regularized NB regression. A random background distribution of mean differences was generated by randomly choosing 1,000 genes and permuting the group labels. Signifi- cance thresholds for the difference of means were derived from the background distribution by taking the 0.5th and 99.5th per- centile. Finally, we called genes differentially expressed if the FDR was below 0.01 and the difference of means exceeded the threshold for significance.”

I am trying to replicate the Figure 6D and 6E.

Thank you!

ChristophH commented 4 years ago

Hi,

Sorry for the delayed response.

First, the DE test used in the preprint is not specific to sctransform (and we did not use integration, just merged the data).

The code used to do the DE testing and visualization of the results is now available at https://osf.io/kc9dg/ . R/differential_expression.R and R/differential_expression_viz.R are the relevant scripts.

Best, Christoph

marcusbasiri commented 4 years ago

Wanted to add a follow up to this.

I would like to run differential expression on the residuals from SCTransform following integration. Integration was run using only a subset of features, so I currently only have the residuals for a fraction of the total features in @scale.data. However, I would like to perform the LRT using all genes.

Since the object was split by sample prior to SCTransform() and integration, the current residuals were calculated based on each sample independently. Therefore, running GetResidual() on the integrated object generates different residuals for features that were originally calculated on the split object before SCT+integration.

Just wondering what is the most appropriate way of doing this? Should I run GetResidual() for all genes on the merged object and use these as input for differential expression, or should I split the object again, calculate residuals, and then merge?

Thanks,

Marcus

ChristophH commented 4 years ago

Please don't crosspost. Wait for a reply at https://github.com/satijalab/seurat/issues/2489

satijalab / sctransform

DE on SCTransformed and Integrated data #47