Open johnchamberlin opened 4 years ago
Hi thanks for your interest. So far this is not supported in scry; we only allow adjusting for cell covariates that are single categorical variables. The reason is because that is the scenario where the residuals can be computed in closed-form. In the future we might provide this feature but it will be much slower because it will require an iterative optimization procedure (basically equivalent to a Poisson regression) for each gene. Here are some possible alternative approaches in the meantime (in decreasing order of favorability):
glm
function from regular R (or some faster equivalent). Compute the log of total counts for each cell as an offset. Then for each gene (independently) regress the vector of gene counts y_j against your covariates matrix X with the offsets and the Poisson glm family with log link. Then take the deviance residuals from the fitted model. That should be equivalent to what scry provides in the case that X encodes a single categorical covariate and/or just an intercept term.oh one more thing, if you have tried glmpca in the past and been disappointed with speed, please make sure you have tried the newer v0.2 release (currently on CRAN), where we have increased the speed with a new optimization algorithm. Of course, it is still slower than computing residuals with scry.
I just started using glmpca yesterday. It runs in [edit: 23 minutes] with 40K cells, 2K features, L=10, which I think is perfectly adequate. But I wanted to produce the deviance matrix as well, which is when I switched to scry.
Can I use deviances to perform differential expression, similar to sctransform, or is the idea simply to use glm-pca (or PCA on deviances) ahead of clustering and then revert to raw counts?
Thanks
I suppose you could use the deviance residuals for differential expression, but we would recommend using the raw counts instead along with a tool like DEseq2 or edgeR. Soneson & Robinson 2018 may be relevant.
By the way, I'm not sure of your use case for deviance residuals other than differential expression, but if you have already run GLM-PCA, you can use the predict.glmpca method to get a matrix of the imputed mean values for original data matrix. This is sort of like a denoised version of the data. If you were to normalize and log-transform it (there won't be any exact zeros), it should have similar properties as the deviance residuals matrix you would get from scry.
Is it possible to include gene or cell covariates when using the nullResiduals approximation of glc-pca (i.e. X and Z)? I am working with a large dataset and am trying to speed up glm-pca by using nullResiduals instead. By cell covariate I mean a computed value, not a batch identity.
Thanks, John