smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
315 stars 31 forks source link

Question about low cell count groups #218

Closed DelongZHOU closed 2 months ago

DelongZHOU commented 3 months ago

Hi Sam,

What's the best approach for cell groups with few cell counts?

In my case I have two groups that I'm interested in with 10~100 cells per sample totalling to ~400 cells. Would it be alright to group some samples into a "meta sample"? What's the minimum threshold for min_cell? For example, the cell count for one group is: Control: 16,31,43,76 Treatment: 16,54,60,103 Would it be acceptable to group the samples in bold, and set the min_cell at 40?

Thank you!

smorabit commented 3 months ago

Hi,

This is a good question, I have come across this problem in my work as well. To overcome this issue, I recommend changing your group.by parameters in MetacellsByGroups. For example, in the tutorial we run this code:

# construct metacells  in each group
seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cell_type", "Sample"), # specify the columns in seurat_obj@meta.data to group by
  reduction = 'harmony', # select the dimensionality reduction to perform KNN on
  k = 25, # nearest-neighbors parameter
  max_shared = 10, # maximum number of shared cells between two metacells
  ident.group = 'cell_type' # set the Idents of the metacell seurat object
)

You can allow hdWGCNA to group together cells from different samples if you remove `"Sample":

seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cell_type"), # only using cell_type, not Sample
  reduction = 'harmony', 
  k = 25, 
  max_shared = 10, 
  ident.group = 'cell_type'
)

It sounds like you want to make sure that you don't merge cells from control + treatment into one meta-cell. For example, let's say your dataset has a meta-data column called "Disease_Group" which tells us which cells are from control or treatment samples. Since you probably don't want those to mix, you can provide this grouping variable to MetacellsByGroups:

# construct metacells  in each group
seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cell_type", "Disease_Group"), # this will form separate metacells for each cell type and each treatment group!
  reduction = 'harmony', 
  k = 25, 
  max_shared = 10, 
  ident.group = 'cell_type'
)

While this should work in general, you also mentioned that this group has a total of 400 cells across all samples. I worry that this may be too few cells for obtaining meaningful results with hdWGCNA, so proceed with caution.

DelongZHOU commented 2 months ago

Hi Sam,

Thanks for your response.

My reasoning was based on this paper on DEG https://www.nature.com/articles/s41467-021-25960-2 which states that pseudobulk methods have fewer false positive by considering sample / replicate variations. So I want to keep the replicate information as much as possible, which is why I used the group.by with sample. Given the differential ME value is calculated using single cell method (and I don't see performing pseudobulk WGCNA with 3 samples / condition feasible), I might try to group within conditions.

Do you have suggestions for quality control for hdWGCNA with low cell number? For example is there some metrics that I can use to compare them to modules identified in the same dataset but higher cell number? Thank you!

smorabit commented 2 months ago

Do you have suggestions for quality control for hdWGCNA with low cell number?

Unfortunately I do not recommend running hdWGCNA with extremely low cell numbers.

DelongZHOU commented 2 months ago

Do you have suggestions for quality control for hdWGCNA with low cell number?

Unfortunately I do not recommend running hdWGCNA with extremely low cell numbers.

That's fair. Thank you!