hms-dbmi-cellenics / issues

This repository is used to report and track issues
1 stars 0 forks source link

Seurat Pipeline automated cluster detection renames values in a confusing way #42

Open gerbeldo opened 10 months ago

gerbeldo commented 10 months ago

Background

For some Seurat objects, some categorical variables are detected as clustering variables, prefixing the name with the label "Cluster". For example, a Seurat object with the metadata variables "sample_name" and "patient" (which match the "group" variable heuristic, as in 1) all cells in a sample belong to at most one group of the patient variable, and 2) the cardinality of patient is less than or equal to the number of samples) returns this:

Image

While it should return the same but without the "cluster" label.

Hypothesis

It could be related to add_samples_col function, and how it looks for a "sample" or "samples" variable and assumes a single sample if not, breaking the cardinality heuristic, and converting these "group" variables to "cluster" variables.

goal

alexvpickering commented 10 months ago

Yes, this is because there was no "sample" or "samples" column. You can confirm by checking that these groupings cannot be used for between sample DE. Some suggestions for improvements from simple to hard: