Closed samuel-marsh closed 3 months ago
Hey Sam, thanks for taking the time to write this issue, and for your PR #202 . I think you're right that it would make sense to add a dims parameter as you have described, it should not be hard for me to implement. Just FYI I am planning to work on the current backlog of hdWGCNA issues on starting on 1/22. I am defending my PhD soon so I unfortunately don't have the time right now to work on any hdWGCNA maintenance stuff, hope you understand!
Hi again,
Please check the newest version of hdWGCNA to use the dims
parameter with MetacellsByGroups
(function documentation here). The default behavior uses all of the dimensions in the selected reduction.
You can either supply indices for the dimensions or a character vector corresponding to the column names.
# option 1: character vector
seurat_obj <- MetacellsByGroups(
seurat_obj = seurat_obj,
group.by = c("cell_type", "Sample"),
k = 25,
max_shared = 12,
reduction = 'harmony',
dims = c('PC_1', 'PC_4', 'PC_5'),
ident.group = 'cell_type'
)
# option 2: indices
seurat_obj <- MetacellsByGroups(
seurat_obj = seurat_obj,
group.by = c("cell_type", "Sample"),
k = 25,
max_shared = 12,
reduction = 'harmony',
dims = c(1:10, 20:30),
ident.group = 'cell_type'
)
Hi Sam/Swarup Lab,
Question for you all about the potential of adding a
dims
parameter toMetacellsByGroups
, to specify subset of dimensions of the reduction to use for KNN construction or whether that is bad idea?There are two use cases I'm thinking of one is for non-PCA based reductions were it is possible to exclude certain dimensions. I'm thinking of ICA, NMF, iNMF (LIGER) specifically. We often do this for ICs or NMF factors that are deemed to be technical in nature so that they don't obscure the biology. So for instance in recent analysis when processing downstream in LIGER after iNMF my dims used for quantile normalization, louvain clustering, and UMAP looked like this:
The other use case is simpler one of standard PCA analysis where maybe the number of PCs computed was very large (75, 100 etc) but the number actually used for downstream analysis was much lower (25,30 etc). In that case wouldn't it be better to only construct knn graph on the same set of PCs for hdWGCNA that were used in downstream SNN, clustering, and UMAP of single cell data?
Or is there something I'm missing and it is preferable to include all dimensions in KNN construction here?
If adding this parameter is valid and you are interested I'm happy to take stab at PR, so just let me know.
Thanks! Sam