smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

When I used the "metacellsbygroup" function, I found an error. #46

Closed zh1221 closed 1 year ago

zh1221 commented 1 year ago

Hi, I want to process data through ‘hdWGCNA’ package, I find an error. ################ library(hdWGCNA) seurat_obj <- SetupForWGCNA(sce.all, wgcna_name = "test")

construct metacells in each group

seurat_obj <- MetacellsByGroups( seurat_obj, group.by = c("Tissue"), k = 25, max_shared = 10 , ident.group = 'Tissue' ) ################# ################# Error in if (sparse && !sM) data <- as(data, "sparseMatrix") else if (!sparse) { : missing value where TRUE/FALSE needed #################

hdWGCNA 0.1.1.9012 Matrix 1.4-1 Seurat 4.1.0 igraph 1.2.11 devtools 2.4.3 WGCNA 1.70-3

Thank you very much in advance for your assistance! Best, Zhaoh

zh1221 commented 1 year ago

I find in the ConstructMetacells funtion, ######### mask <- sapply(seq_len(nrow(cell_sample)), function(x) seq_len(ncol(exprs_old)) %in% cell_sample[x, , drop = FALSE]) mask <- Matrix::Matrix(mask) #### here is erro new_exprs <- (exprs_old %*% mask) #########

smorabit commented 1 year ago

Are you able to reproduce the error with the tutorial dataset?

It would be helpful if you could include the code that you used to process your Seurat object so I can have a better idea of how to help you.

Dimmiso commented 1 year ago

Hi, I have the same problem. (the object is here: https://drive.google.com/file/d/1G_7LMR7Ez9WG4bHfDLa9WFdGpZPEbFfg/view?usp=sharing)

´a <- c(12, 5, 27)

seurat_obj <- MetacellsByGroups( seurat_obj, group.by = c("cl.PEP.fused"), # specify the columns in seurat_obj@meta.data to group by k = a[1], #20, # nearest-neighbors parameter, default = 25 max_shared = a[2], #2-works, #5, # maximum number of shared cells between two metacells, default = 15 min_cells = a[3], #25, #default = 100 ident.group = "cl.PEP.fused", # set the Idents of the metacell seurat object assay = 'integrated',

slot = 'integrated',

verbose = T )´

For pameters determined by vector a, the error message is

........... Overlap QC metrics: Cells per bin: 12 Maximum shared cells bin-bin: 5 Mean shared cells bin-bin: 1.10917874396135 Median shared cells bin-bin: 1 Error in if (sparse && !sM) data <- as(data, "sparseMatrix") else if (!sparse) { : missing value where TRUE/FALSE needed In addition: Warning message: In MetacellsByGroups(seurat_obj, group.by = c("cl.PEP.fused"), k = a[1], : Removing the following groups that did not meet min_cells: A-LTMR, TrpM8 .........

To give an idea on number combinations I tried and failed: a <- c(15, 2, 30) a <- c(12, 2, 30) a <- c(10, 2, 30) a <- c(8, 2, 30) a <- c(20, 2, 30) a <- c(15, 6, 30) a <- c(15, 8, 30) a <- c(15, 10, 30) a <- c(15, 4, 30) a <- c(20, 12, 30) a <- c(20, 14, 30) a <- c(20, 16, 30) a <- c(20, 10, 30) a <- c(20, 8, 30) a <- c(20, 6, 30) a <- c(20, 10, 50) a <- c(25, 10, 50) a <- c(10, 5, 25) a <- c(20, 15, 70) a <- c(20, 12, 70) a <- c(20, 10, 70) a <- c(20, 8, 70) a <- c(20, 5, 70) a <- c(20, 3, 70) a <- c(20, 5, 50)

a <- c(12, 5, 27)

error message is always the same but metrics, of cause, differ.

Some of these paremater combinations, but not others, worked on some other seurat objects. So, I suspect that problem is "incompartability" of paramers k and max_shared in the command with cluster sizes. In my situation cluster sizes (cl.PEP.fused) are: A-LTMR C-LTMR NP1 NP2 NP3 PEP1 PEP2 PEP3 TrpM8 16 168 1079 651 135 341 27 51 25

It would be great, if possible, to update description of MetacellsByGroups command that one can reason which k and max_shared (and other relevant parameters) to choose if minimal cluster size is N cells (if I am right about the error origin). Of cause, it would be great to include in the analysis even the smallest cluster which in my case has just 16 cells (or at least 25?). Does it make any sense? Would it make sense to adjust k according to individual cluster size? :-(

Thanks a lot for great tool! Best, Dmitry

Dimmiso commented 1 year ago

Well, actually with default parameters the same error is thrown:

Overlap QC metrics: Cells per bin: 25 Maximum shared cells bin-bin: 15 Mean shared cells bin-bin: 5.51787439613527 Median shared cells bin-bin: 5 Error in if (sparse && !sM) data <- as(data, "sparseMatrix") else if (!sparse) { : missing value where TRUE/FALSE needed

Thanks for your support, Sam! D

smorabit commented 1 year ago

Hi Dmitry,

Thanks again for providing your data, makes it a lot easier to find the source of these issues! It looks like you are trying to run MetacellsByGroups with the integrated assay rather than the RNA assay. By default, MetacellsByGroups is looking for the counts slot in the selected assay. However, it seems like for your case, the counts slot was not found, so I was able to successfully run this code on your function by setting slot = data.


a <- c(12, 5, 27)

seurat_obj <- SetupForWGCNA(
  seurat_obj,
  wgcna_name = 'test'
)

seurat_obj <- MetacellsByGroups(
  seurat_obj,
  group.by = c("cl.PEP.fused"), 
  k = a[1],
  max_shared = a[2], 
  min_cells = a[3],
  ident.group = "cl.PEP.fused",
  assay = 'integrated',
  slot='data',
  verbose = T
)

Based on this issue, I added a new data format check in MetacellsByGroups to throw an error if the selected slot is not found in the selected assay. Please let me know if this solves your issue.

Dimmiso commented 1 year ago

Hi, thanks a lot for support! It works now indeed! One comment. data I work with is "human-like" with very few N, different sex, very different age and even by nature of the samples, representation of different clusters may vary a lot between samples. So, data integration is a big issue for us. Anchor based data integration implemented in Seurat works really well for me. For very this reason it looks logical for me to use integrated data as an input for WGCNA. Thanks!

smorabit commented 1 year ago

Makes sense to use the integrated dataset in your case for sure. Glad that this resolved your issue. Going to close it for now but @zh1221 please re-open if needed.

adamklie commented 1 year ago

Hey Sam! Wanted to follow up on this a bit.

I have a case where I want to use the integrated data to define metacells but the counts for the actual WGCNA. This is mostly because I don't want to only consider the top ~2000 variable features when I perform the WGCNA, and integration bottlenecks the object to have only those variable features. I would guess I have to do something like create the metacells with MetacellsByGroups using the "integrated" assay and then use the metacell definition to go back and define a new metacell object with the counts? Is this possible to do within the package?

Thanks!

smorabit commented 1 year ago

Hi Adam,

If I understand correctly, you want to use the dim reduction from one assay, but the expression matrix from another. I don't think this can be accomplished with hdWGCNA directly but you can make a slight tweak to your Seurat object. I suggest just adding the integrated dim reduction into the RNA assay.


DefaultAssay(seurat_obj) <- 'integrated'
integrated_reduct <- Reduction(seurat_obj, 'my_integrated_dimreduct')
DefaultAssay(seurat_obj) <- 'RNA'
seurat_obj@reductions$integrated <- integrated_reduct 

Then you can go ahead and run MetacellsByGroups with the integrated dimensionality reduction and the normal counts matrix.

adamklie commented 1 year ago

Thanks for such a quick a response! That's exactly right. So then a MetacellsByGroups call might look like this?

seurat_obj <- MetacellsByGroups(
  seurat_obj,
  group.by = c('Sample'),
  k = 25,
  assay = 'RNA',
  slot = 'counts',
  reduction = 'my_integrated_dimreduct',
  ident.group = 'Sample',
)

Just want to check my understanding, this will use the integrated dim reduction to find NNs for identifying cells that get collapsed into metacells, but will then aggregate the actual expression matrix from the RNA "counts" slot?