hms-dbmi-cellenics / issues

This repository is used to report and track issues
1 stars 0 forks source link

[BUG] genes with minimal expression in one sample are excluded from integrated analysis #3

Closed alexvpickering closed 2 years ago

alexvpickering commented 2 years ago

This issue arrises because we create SeuratObject's for each sample, removing features that are not expressed in a minimum of 3 cells:

https://github.com/hms-dbmi-cellenics/pipeline/blob/bffe4d6b64a9482fcc97a171da5573f40ed4a9c2/pipeline-runner/R/gem2s-5-create_seurat.R#L47-L56

construct_scdata <- function(counts, doublet_score, edrops_out, sample, annot, config, min.cells = 3, min.features = 10) {
  metadata <- construct_metadata(counts, sample, config)

  scdata <- Seurat::CreateSeuratObject(
    counts,
    meta.data = metadata,
    project = config$name,
    min.cells = min.cells,
    min.features = min.features
  )
  ...
}

If a gene is excluded in one sample, it will be excluded in the integrated dataset (which uses common genes).

This is particularly problematic as if will prevent the detection of genes that are, for example, not expressed in control samples and highly expressed in test samples.