satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
207 stars 33 forks source link

Avoid removing outliers #22

Closed sunanthas9 closed 5 years ago

sunanthas9 commented 5 years ago

Hello!

I am running scTransform as part of a comparison study and would like to keep all the genes in the output matrix for consistency reasons. I wonder if there is a way to avoid removing outliers. Setting the 'is_outlier' function to return a vector of FALSE did not solve the problem. Is there another alternative?

If this is not possible, I would appreciate some recommendations for adding the outlier genes back to the matrix after normalization. From our initial analysis, the outlier genes tend to be on the side of higher expression and we think that setting them to 0 might be a bad approach.

Thank you in advance for your help!

Sunantha

ChristophH commented 5 years ago

Hi Sunantha, Let me first clarify that the is_outlier function is used to flag genes as outliers if their model parameters are very different from other genes with similar mean. These genes are not removed from the output, but they are ignored for the purpose of fitting the overall gene mean to parameter value relationship. What genes do you miss in the output? By default, the only genes removed are the ones with less than 5 non-zero observations. This value can be changed using the min_cells parameter. I don't recommend setting it too low because the initial model fitting can fail for rarely detected genes. If you are running the Seurat wrapper Seurat::SCTransform, you may want to make sure that return.only.var.genes = FALSE is set.