satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
206 stars 33 forks source link

run.test vs. compare_expression #50

Closed alexander-7 closed 4 years ago

alexander-7 commented 4 years ago

Hi, first of all thanks a lot for the great tool and the effort you put into this!

I have a question your preprint on bioRxiv https://www.biorxiv.org/content/biorxiv/early/2019/03/18/576827.full.pdf. As you suggest in this issue: https://github.com/ChristophH/sctransform/issues/47#issue-510711859, I looked through the code and noticed you run the statistics on the results from SCTransform extracted from the scale.data slot of the SCT assay on the SeuratObject (after running SCTransform) which is created only with the cells of interest.

1) Would you prefer this approach over running scTransform::vst() followed by scTransform::compare_expression() on the raw expression matrix as you outline in your vignette for differential expression or would you see them as equal?

2) I noticed that you subset the dataset for the cells you want to run differential expression on before creating the seurat object and running SCTransform? Would it be possible to run SCTransform on the whole dataset first and then extract only the relevant columns from the scale.data slot and feed those into run.test?

Background of my questions is, that I have a dataset with very significant differences in sequencing depths accross samples and SCTransforms performs really well on the dataset. I just want to assure myself which DEG analysis is the most appropriate!

Thanks a lot for your input!

ChristophH commented 4 years ago

Hi,

  1. The approach in the vignette (sctransform::compare_expression()) does a likelihood ratio test (LRT) between a NB model with just the offset (expected counts given the regularized model) and a model that also includes a group indicator variable. The approach in the paper uses a t-test on the residuals but also provides the results on randomized input for comparison. For the paper we decided to use the second approach since we did not want this to be a DE-test paper and it felt like a t-test would be sufficient. I have not tested the LRT approach on enough datasets to recommend it over the t-test.

  2. This might be less sensitive, but should also work.

For additional DE methods and implementations, you can also check the Seurat DE analysis vignette.

Reminder: The final version of the paper is now available at Genome Biology and code and data is here.

alexander-7 commented 4 years ago

Hi, thanks a lot for your quick response and the clarification! Congrats on the nice paper. I'll cite it accordingly. I will close the issue.