Adding dims parameter to `MetacellsByGroups`

samuel-marsh commented 4 months ago

Hi Sam/Swarup Lab,

Question for you all about the potential of adding a dims parameter to MetacellsByGroups, to specify subset of dimensions of the reduction to use for KNN construction or whether that is bad idea?

There are two use cases I'm thinking of one is for non-PCA based reductions were it is possible to exclude certain dimensions. I'm thinking of ICA, NMF, iNMF (LIGER) specifically. We often do this for ICs or NMF factors that are deemed to be technical in nature so that they don't obscure the biology. So for instance in recent analysis when processing downstream in LIGER after iNMF my dims used for quantile normalization, louvain clustering, and UMAP looked like this:

all_factors <- 1:15
# dims to exclude
# factor5 is mito genes
# factor3 is ribo genes

dims_use <- setdiff(all_factors, c(3, 5))

all_myeloid_liger <- quantile_norm(object = all_myeloid_liger, knn_k = 15, dims.use = dims_use)
all_myeloid_liger <- louvainCluster(object = all_myeloid_liger, resolution = 0.4, k = 20, dims.use = dims_use)
all_myeloid_liger <- runUMAP(all_myeloid_liger, n_neighbors = 30, min_dist = 0.3, dims.use = dims_use)

The other use case is simpler one of standard PCA analysis where maybe the number of PCs computed was very large (75, 100 etc) but the number actually used for downstream analysis was much lower (25,30 etc). In that case wouldn't it be better to only construct knn graph on the same set of PCs for hdWGCNA that were used in downstream SNN, clustering, and UMAP of single cell data?

Or is there something I'm missing and it is preferable to include all dimensions in KNN construction here?

If adding this parameter is valid and you are interested I'm happy to take stab at PR, so just let me know.

Thanks! Sam

smorabit commented 4 months ago

Hey Sam, thanks for taking the time to write this issue, and for your PR #202 . I think you're right that it would make sense to add a dims parameter as you have described, it should not be hard for me to implement. Just FYI I am planning to work on the current backlog of hdWGCNA issues on starting on 1/22. I am defending my PhD soon so I unfortunately don't have the time right now to work on any hdWGCNA maintenance stuff, hope you understand!

smorabit commented 3 months ago

Hi again,

Please check the newest version of hdWGCNA to use the dims parameter with MetacellsByGroups (function documentation here). The default behavior uses all of the dimensions in the selected reduction.

You can either supply indices for the dimensions or a character vector corresponding to the column names.

# option 1: character vector
seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cell_type", "Sample"),
  k = 25,
  max_shared = 12,
  reduction = 'harmony',
  dims = c('PC_1', 'PC_4', 'PC_5'),
  ident.group = 'cell_type'
)

# option 2: indices
seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cell_type", "Sample"),
  k = 25,
  max_shared = 12,
  reduction = 'harmony',
  dims = c(1:10, 20:30),
  ident.group = 'cell_type'
)

smorabit / hdWGCNA

Adding dims parameter to `MetacellsByGroups` #203