Hi! Sorry I couldn't find a "question" type, so I am using "bug report". Thanks a lot for developing this package! I am trying to use it, but coming across a few questions:

1) I know that the dataset needs to go through the Seurat pipeline first, but I am wondering whether the dataset can be filtered? My current Seurat obj has been filtered by min.cells = 3, min.features = 200.

2) In the tutorial, when constructing metacells, you used k=25 for ~40k cells in your dataset. My scRNA-seq dataset has ~140K cells and I am wondering what range of 'k' values I should try out. I am currently running with k=40 as a random choice.

coexp_hdWGCNA_geneids <- SetupForWGCNA( coexp_hdWGCNA_geneids, gene_select = "fraction", fraction = 0.05, wgcna_name = "Fraction_5")

coexp_hdWGCNA_geneids <- MetacellsByGroups( seurat_obj = coexp_hdWGCNA_geneids, group.by = c("Celltypes_global", "Sample"), reduction = 'harmony', k = 50, max_shared = 12, ident.group = 'Celltypes_global' )

It runs, however I get this warning:

Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Error in apply(cell_sample, 1, function(x) { : 
  dim(X) must have a positive length
In addition: Warning messages:
1: In MetacellsByGroups(seurat_obj = coexp_hdWGCNA_geneids, group.by = c("Celltypes_global",  :
  Removing the following groups that did not meet min_cells: B-Cells#GSM3954947, B-Cells#GSM3954949, B-Cells#GSM3954950, B-Cells#GSM3954951, B-Cells#GSM3954952, B-Cells#GSM3954954, B-Cells#P5846_normal, B-Cells#P5931_normal_rep1, B-Cells#P5931_normal_rep2, B-Cells#P6342_normal, B-Cells#P6592_normal, B-Cells#P6649_normal, B-Cells#Patient02_SIGAD9_NSCJ, B-Cells#Patient02_SIGAE9_NG, B-Cells#Patient03_SIGAB5_BSCJ, B-Cells#Patient06_Karol-A_NE, B-Cells#Patient06_Karol-C_NSCJ, B-Cells#Patient06_Karol-E_NG, B-Cells#Patient07_SIGAA4_NE, B-Cells#Patient07_SIGAC4_BE, B-Cells#Patient07_SIGAD4_NG, B-Cells#Patient07_SIGAE4_ND, B-Cells#Patient08_SIGAF7_NG, B-Cells#Patient09_SIGAD9_NE, B-Cells#Patient09_SIGAE9_BSCJ, B-Cells#Patient09_SIGAF9_BE, B-Cells#Patient09_SIGAG9_NG, B-Cells#Patient10_SIGAA5_NE, B-Cells#Patient10_SIGAB5_NSCJ, B-Cells#Patient10_SIGAD5_SMG, B-Cells#Patient11_SIGAA4_SMG, B-Cells#Patient12_SIGAE3_ND, B-Cells#Patient13_SIGAE8_NG, B-Cells#Patient14_SIGAE8_GIM, B-Cells#Patient15_SIGAG11 [... truncated]
2: In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation
3: In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation
4: In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation
5: In get(object, envir = currentEnv, inherits = TRUE) :
  restarting interrupted promise evaluation
6: In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation

3) You mentioned that _"MetacellsByGroups has a parameter min_cells to exclude groups that are smaller than a specified number of cells. Errors are likely to arise if the selected value for mincells is too low" . But you don't expand on that. I can see from my output that B-cells did not make the cut but I wonder if there are any more cells that got cut out? What does [truncated] mean here? Also, what would you consider a good cutoff point for number of cells? I do have some small cell populations that are important, and I would like to understand if they would be kept in the analysis.

4) I don't really understand why metacells are constructed for each cell type separately. I need to have a better understanding of my data as a whole and identify all modules using the weighted correlation matrix. In my dataset, we have 25 different types of cells. From this issue https://github.com/smorabit/hdWGCNA/issues/109 you mentioned that in the tutorial the module eingengene calculation is done with the whole dataset and that only the network is constructed with the inhibitory neurons. This makes sense but how feasible is it to get networks for all types of cells?

5) I don't understand the purpose of further processing the Metacell Seurat Obj, would it be just for visualization purposes? To visualize the aggregated expression profiles in an UMAP?

Thanks a lot again and I look forward to hearing from you!

sessionInfo()

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hdWGCNA_0.2.19        WGCNA_1.72-1          fastcluster_1.2.3     dynamicTreeCut_1.63-1 devtools_2.4.5        usethis_2.1.6        
 [7] igraph_1.4.3          readxl_1.4.2          harmony_0.1.1         Rcpp_1.0.10           lubridate_1.9.2       forcats_1.0.0        
[13] stringr_1.5.0         dplyr_1.1.2           purrr_1.0.1           readr_2.1.4           tidyr_1.3.0           tibble_3.2.1         
[19] ggplot2_3.4.2         tidyverse_2.0.0       SeuratObject_4.1.3    Seurat_4.3.0         

loaded via a namespace (and not attached):
  [1] utf8_1.2.3             spatstat.explore_3.2-1 reticulate_1.28        tidyselect_1.2.0       RSQLite_2.3.1          AnnotationDbi_1.60.2  
  [7] htmlwidgets_1.6.2      grid_4.2.3             Rtsne_0.16             munsell_0.5.0          preprocessCore_1.60.2  codetools_0.2-19      
 [13] ica_1.0-3              future_1.32.0          miniUI_0.1.1.1         tester_0.1.7           withr_2.5.0            spatstat.random_3.1-5 
 [19] colorspace_2.1-0       progressr_0.13.0       Biobase_2.58.0         knitr_1.43             rstudioapi_0.14        stats4_4.2.3          
 [25] ROCR_1.0-11            tensor_1.5             listenv_0.9.0          GenomeInfoDbData_1.2.9 polyclip_1.10-4        bit64_4.0.5           
 [31] parallelly_1.36.0      vctrs_0.6.2            generics_0.1.3         xfun_0.39              timechange_0.2.0       R6_2.5.1              
 [37] doParallel_1.0.17      GenomeInfoDb_1.34.9    bitops_1.0-7           spatstat.utils_3.0-3   cachem_1.0.8           promises_1.2.0.1      
 [43] scales_1.2.1           nnet_7.3-18            gtable_0.3.3           globals_0.16.2         processx_3.8.1         goftest_1.2-3         
 [49] rlang_1.1.1            splines_4.2.3          lazyeval_0.2.2         impute_1.72.3          spatstat.geom_3.2-1    checkmate_2.2.0       
 [55] yaml_2.3.7             reshape2_1.4.4         abind_1.4-5            backports_1.4.1        httpuv_1.6.11          Hmisc_5.1-0           
 [61] tools_4.2.3            ellipsis_0.3.2         RColorBrewer_1.1-3     BiocGenerics_0.44.0    sessioninfo_1.2.2      ggridges_0.5.4        
 [67] plyr_1.8.8             base64enc_0.1-3        zlibbioc_1.44.0        RCurl_1.98-1.12        ps_1.7.5               prettyunits_1.1.1     
 [73] rpart_4.1.19           deldir_1.0-9           pbapply_1.7-0          cowplot_1.1.1          urlchecker_1.0.1       S4Vectors_0.36.2      
 [79] zoo_1.8-12             ggrepel_0.9.3          cluster_2.1.4          fs_1.6.2               magrittr_2.0.3         data.table_1.14.8     
 [85] scattermore_1.1        lmtest_0.9-40          RANN_2.6.1             fitdistrplus_1.1-11    matrixStats_1.0.0      pkgload_1.3.2         
 [91] hms_1.1.3              patchwork_1.1.2        mime_0.12              evaluate_0.21          xtable_1.8-4           IRanges_2.32.0        
 [97] gridExtra_2.3          compiler_4.2.3         KernSmooth_2.23-20     crayon_1.5.2           htmltools_0.5.5        later_1.3.1           
[103] tzdb_0.4.0             Formula_1.2-5          DBI_1.1.3              MASS_7.3-58.3          Matrix_1.5-1           cli_3.6.1             
[109] parallel_4.2.3         pkgconfig_2.0.3        foreign_0.8-82         sp_1.6-1               plotly_4.10.1          spatstat.sparse_3.0-1 
[115] foreach_1.5.2          XVector_0.38.0         callr_3.7.3            digest_0.6.31          sctransform_0.3.5      RcppAnnoy_0.0.20      
[121] spatstat.data_3.0-1    Biostrings_2.66.0      rmarkdown_2.22         cellranger_1.1.0       leiden_0.4.3           htmlTable_2.4.1       
[127] uwot_0.1.14            shiny_1.7.4            lifecycle_1.0.3        nlme_3.1-162           jsonlite_1.8.4         viridisLite_0.4.2     
[133] fansi_1.0.4            pillar_1.9.0           lattice_0.20-45        KEGGREST_1.38.0        fastmap_1.1.1          httr_1.4.6            
[139] pkgbuild_1.4.0         survival_3.5-5         GO.db_3.16.0           glue_1.6.2             remotes_2.4.2          FNN_1.1.3.2           
[145] png_0.1-8              iterators_1.0.14       bit_4.0.5              stringi_1.7.12         profvis_0.3.8          blob_1.2.4            
[151] memoise_2.0.1          irlba_2.3.5.1          future.apply_1.11.0

HI, first of all I appreciate your questions but in the future it is best practice to keep one GitHub issue for one question / topic, and to only include the most critical information for me to understand and answer your question or resolve the issue.

I know that the dataset needs to go through the Seurat pipeline first, but I am wondering whether the dataset can be filtered? My current Seurat obj has been filtered by min.cells = 3, min.features = 200.

Yes this is fine.

In the tutorial, when constructing metacells, you used k=25 for ~40k cells in your dataset. My scRNA-seq dataset has ~140K cells and I am wondering what range of 'k' values I should try out. I am currently running with k=40 as a random choice.

You can try as many values of k as you like. The goal is to reduce the sparsity of the data, while retaining the cellular heterogeneity of the dataset, and in our paper we found that there is a range of suitable values for k.

You mentioned that "MetacellsByGroups has a parameter min_cells to exclude groups that are smaller than a specified number of cells. Errors are likely to arise if the selected value for min_cells is too low" . But you don't expand on that. I can see from my output that B-cells did not make the cut but I wonder if there are any more cells that got cut out? What does [truncated] mean here? Also, what would you consider a good cutoff point for number of cells? I do have some small cell populations that are important, and I would like to understand if they would be kept in the analysis.

I think the truncated is from your R terminal when it stops printing, but I am not sure, that is not a message created by hdWGCNA. You can check the resulting metacell seurat object to see which cells have associated metacells.

I don't really understand why metacells are constructed for each cell type separately. I need to have a better understanding of my data as a whole and identify all modules using the weighted correlation matrix. In my dataset, we have 25 different types of cells. From this issue https://github.com/smorabit/hdWGCNA/issues/109 you mentioned that in the tutorial the module eingengene calculation is done with the whole dataset and that only the network is constructed with the inhibitory neurons. This makes sense but how feasible is it to get networks for all types of cells?

We don't want to merge two cells of different types into one metacell so it's done separately. However, you can bypass this if you wish by changing the group.by parameter. You can run a for loop to repeat the network construction for all of your cell types.

I don't understand the purpose of further processing the Metacell Seurat Obj, would it be just for visualization purposes? To visualize the aggregated expression profiles in an UMAP?

As mentioned in the tutorial this is totally optional, as it does not have any bearing on the rest of the pipeline.

Thanks a lot for your reply and I will keep in mind the best practices you mentioned. I still am finding the same Error when trying to run MetacellsByGroups():

#construct metacells  in each group
coexp_hdWGCNA_geneids <- MetacellsByGroups(
  seurat_obj = coexp_hdWGCNA_geneids,
  group.by = c("Celltypes_global", "Sample"), # specify the columns in seurat_obj@meta.data to group by
  reduction = 'harmony', # select the dimensionality reduction to perform KNN on
  k = 50, # nearest-neighbors parameter
  max_shared = 10, # maximum number of shared cells between two metacells,
  ident.group = 'Celltypes_global', # set the Idents of the metacell seurat object
  verbose = TRUE
)

Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 8
Mean shared cells bin-bin: 4
Median shared cells bin-bin: 4
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 5
Mean shared cells bin-bin: 3
Median shared cells bin-bin: 4
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 5
Mean shared cells bin-bin: 5
Median shared cells bin-bin: 5
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 9
Mean shared cells bin-bin: 1.57142857142857
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 6.33333333333333
Median shared cells bin-bin: 9
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 5
Median shared cells bin-bin: 5
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 2.33333333333333
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 10
Median shared cells bin-bin: 10
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 4
Median shared cells bin-bin: 3
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 9
Mean shared cells bin-bin: 1.08888888888889
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 1.10714285714286
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 1
Mean shared cells bin-bin: 0.166666666666667
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 4
Mean shared cells bin-bin: 4
Median shared cells bin-bin: 4
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 6
Mean shared cells bin-bin: 4.33333333333333
Median shared cells bin-bin: 5
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 3.66666666666667
Median shared cells bin-bin: 1
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 10
Mean shared cells bin-bin: 1.21818181818182
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 7
Mean shared cells bin-bin: 1.16666666666667
Median shared cells bin-bin: 0
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 7
Mean shared cells bin-bin: 7
Median shared cells bin-bin: 7
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Overlap QC metrics:
Cells per bin: 50
Maximum shared cells bin-bin: 7
Mean shared cells bin-bin: 6
Median shared cells bin-bin: 7
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Error in apply(cell_sample, 1, function(x) { : 
  dim(X) must have a positive length
In addition: Warning messages:
1: In MetacellsByGroups(seurat_obj = coexp_hdWGCNA_geneids, group.by = c("Celltypes_global",  :
  Removing the following groups that did not meet min_cells: B-Cells#GSM3954947, B-Cells#GSM3954949, B-Cells#GSM3954950, B-Cells#GSM3954951, B-Cells#GSM3954952, B-Cells#GSM3954954, B-Cells#P5846_normal, B-Cells#P5931_normal_rep1, B-Cells#P5931_normal_rep2, B-Cells#P6342_normal, B-Cells#P6592_normal, B-Cells#P6649_normal, B-Cells#Patient02_SIGAD9_NSCJ, B-Cells#Patient02_SIGAE9_NG, B-Cells#Patient03_SIGAB5_BSCJ, B-Cells#Patient06_Karol-A_NE, B-Cells#Patient06_Karol-C_NSCJ, B-Cells#Patient06_Karol-E_NG, B-Cells#Patient07_SIGAA4_NE, B-Cells#Patient07_SIGAC4_BE, B-Cells#Patient07_SIGAD4_NG, B-Cells#Patient07_SIGAE4_ND, B-Cells#Patient08_SIGAF7_NG, B-Cells#Patient09_SIGAD9_NE, B-Cells#Patient09_SIGAE9_BSCJ, B-Cells#Patient09_SIGAF9_BE, B-Cells#Patient09_SIGAG9_NG, B-Cells#Patient10_SIGAA5_NE, B-Cells#Patient10_SIGAB5_NSCJ, B-Cells#Patient10_SIGAD5_SMG, B-Cells#Patient11_SIGAA4_SMG, B-Cells#Patient12_SIGAE3_ND, B-Cells#Patient13_SIGAE8_NG, B-Cells#Patient14_SIGAE8_GIM, B-Cells#Patient15_SIGAG11 [... truncated]
2: In (function (seurat_obj, name = "agg", ident.group = "seurat_clusters",  :
  On average, more than 10% of cells are shared between paired bins.
3: In (function (seurat_obj, name = "agg", ident.group = "seurat_clusters",  :
  On average, more than 10% of cells are shared between paired bins.
4: In (function (seurat_obj, name = "agg", ident.group = "seurat_clusters",  :
  On average, more than 10% of cells are shared between paired bins.
5: In (function (seurat_obj, name = "agg", ident.group = "seurat_clusters",  :
  On average, more than 10% of cells are shared between paired bins.

I still don't understand what this means:

Error in apply(cell_sample, 1, function(x) { : dim(X) must have a positive length

And then when I try to normalize it, it doesn't work:

# normalize metacell expression matrix
coexp_hdWGCNA_geneids <- NormalizeMetacells(coexp_hdWGCNA_geneids, wgcna_name = "Neck")
Error in UseMethod(generic = "as.sparse", object = x) : 
  no applicable method for 'as.sparse' applied to an object of class "NULL"

sessionInfo()

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hdWGCNA_0.2.19        WGCNA_1.72-1          fastcluster_1.2.3    
 [4] dynamicTreeCut_1.63-1 devtools_2.4.5        usethis_2.1.6        
 [7] igraph_1.4.3          readxl_1.4.2          harmony_0.1.1        
[10] Rcpp_1.0.10           lubridate_1.9.2       forcats_1.0.0        
[13] stringr_1.5.0         dplyr_1.1.2           purrr_1.0.1          
[16] readr_2.1.4           tidyr_1.3.0           tibble_3.2.1         
[19] ggplot2_3.4.2         tidyverse_2.0.0       SeuratObject_4.1.3   
[22] Seurat_4.3.0         

loaded via a namespace (and not attached):
  [1] utf8_1.2.3             spatstat.explore_3.2-1 reticulate_1.28       
  [4] tidyselect_1.2.0       RSQLite_2.3.1          AnnotationDbi_1.60.2  
  [7] htmlwidgets_1.6.2      grid_4.2.3             Rtsne_0.16            
 [10] munsell_0.5.0          preprocessCore_1.60.2  codetools_0.2-19      
 [13] ica_1.0-3              future_1.32.0          miniUI_0.1.1.1        
 [16] tester_0.1.7           withr_2.5.0            spatstat.random_3.1-5 
 [19] colorspace_2.1-0       progressr_0.13.0       Biobase_2.58.0        
 [22] knitr_1.43             rstudioapi_0.14        stats4_4.2.3          
 [25] ROCR_1.0-11            tensor_1.5             listenv_0.9.0         
 [28] GenomeInfoDbData_1.2.9 polyclip_1.10-4        bit64_4.0.5           
 [31] rprojroot_2.0.3        parallelly_1.36.0      vctrs_0.6.2           
 [34] generics_0.1.3         xfun_0.39              timechange_0.2.0      
 [37] R6_2.5.1               doParallel_1.0.17      GenomeInfoDb_1.34.9   
 [40] bitops_1.0-7           spatstat.utils_3.0-3   cachem_1.0.8          
 [43] promises_1.2.0.1       scales_1.2.1           nnet_7.3-18           
 [46] gtable_0.3.3           globals_0.16.2         processx_3.8.1        
 [49] goftest_1.2-3          rlang_1.1.1            splines_4.2.3         
 [52] lazyeval_0.2.2         impute_1.72.3          spatstat.geom_3.2-1   
 [55] checkmate_2.2.0        yaml_2.3.7             reshape2_1.4.4        
 [58] abind_1.4-5            backports_1.4.1        httpuv_1.6.11         
 [61] Hmisc_5.1-0            tools_4.2.3            ellipsis_0.3.2        
 [64] RColorBrewer_1.1-3     BiocGenerics_0.44.0    sessioninfo_1.2.2     
 [67] ggridges_0.5.4         plyr_1.8.8             base64enc_0.1-3       
 [70] zlibbioc_1.44.0        RCurl_1.98-1.12        ps_1.7.5              
 [73] prettyunits_1.1.1      rpart_4.1.19           deldir_1.0-9          
 [76] pbapply_1.7-0          cowplot_1.1.1          urlchecker_1.0.1      
 [79] S4Vectors_0.36.2       zoo_1.8-12             ggrepel_0.9.3         
 [82] cluster_2.1.4          fs_1.6.2               magrittr_2.0.3        
 [85] data.table_1.14.8      scattermore_1.1        lmtest_0.9-40         
 [88] RANN_2.6.1             fitdistrplus_1.1-11    matrixStats_1.0.0     
 [91] pkgload_1.3.2          hms_1.1.3              patchwork_1.1.2       
 [94] mime_0.12              evaluate_0.21          xtable_1.8-4          
 [97] IRanges_2.32.0         gridExtra_2.3          compiler_4.2.3        
[100] KernSmooth_2.23-20     crayon_1.5.2           htmltools_0.5.5       
[103] later_1.3.1            tzdb_0.4.0             Formula_1.2-5         
[106] DBI_1.1.3              MASS_7.3-58.3          Matrix_1.5-1          
[109] cli_3.6.1              parallel_4.2.3         pkgconfig_2.0.3       
[112] foreign_0.8-82         sp_1.6-1               plotly_4.10.1         
[115] spatstat.sparse_3.0-1  foreach_1.5.2          XVector_0.38.0        
[118] callr_3.7.3            digest_0.6.31          sctransform_0.3.5     
[121] RcppAnnoy_0.0.20       spatstat.data_3.0-1    Biostrings_2.66.0     
[124] rmarkdown_2.22         cellranger_1.1.0       leiden_0.4.3          
[127] htmlTable_2.4.1        uwot_0.1.14            curl_5.0.0            
[130] shiny_1.7.4            lifecycle_1.0.3        nlme_3.1-162          
[133] jsonlite_1.8.4         desc_1.4.2             viridisLite_0.4.2     
[136] fansi_1.0.4            pillar_1.9.0           lattice_0.20-45       
[139] KEGGREST_1.38.0        fastmap_1.1.1          httr_1.4.6            
[142] pkgbuild_1.4.0         survival_3.5-5         GO.db_3.16.0          
[145] glue_1.6.2             remotes_2.4.2          FNN_1.1.3.2           
[148] png_0.1-8              iterators_1.0.14       bit_4.0.5             
[151] stringi_1.7.12         profvis_0.3.8          blob_1.2.4            
[154] memoise_2.0.1          irlba_2.3.5.1          future.apply_1.11.0

Update:

I kept on playing with the code and changed some arguments and was able to run MetacellsByGroup. However I would like to understand why it didn't run before and why now I don't get the error anymore?

This is the code that worked:

construct metacells in each group

coexp_hdWGCNA_geneids <- MetacellsByGroups(
  seurat_obj = coexp_hdWGCNA_geneids,
  group.by = c("Celltypes_global", "Sample"), 
  reduction = 'harmony', 
  slot = "data",
  k = 40, 
  max_shared = 15, 
  min_cells = 100,
  ident.group = 'Celltypes_global', 
  verbose = TRUE
)

coexp_hdWGCNA_geneids <- NormalizeMetacells(coexp_hdWGCNA_geneids)
coexp_hdWGCNA_geneids <- ScaleMetacells(coexp_hdWGCNA_geneids, features = VariableFeatures(coexp_hdWGCNA_geneids), wgcna_name = "Neck")
coexp_hdWGCNA_geneids <- RunPCAMetacells(coexp_hdWGCNA_geneids, features = VariableFeatures(coexp_hdWGCNA_geneids), wgcna_name = "Neck")
coexp_hdWGCNA_geneids <- RunHarmonyMetacells(coexp_hdWGCNA_geneids, group.by.vars='Sample')
coexp_hdWGCNA_geneids <- RunUMAPMetacells(coexp_hdWGCNA_geneids, reduction = "harmony", dims = 1:25)

But then the next step doesn't work. I keep on receiving this error when trying to set up the expression matrix:

setup expression matrix

coexp_hdWGCNA_geneids <- SetDatExpr(
    coexp_hdWGCNA_geneids,
    group_name = "Neck-Cells",
    group.by = "Celltypes_global",
    use_metacells = TRUE,
    slot = 'data',
    assay = "RNA"
  )

Error in intI(i, n = d[1L], dn[[1L]], give.dn = FALSE) : 
  invalid character indexing

Although my error is not exactly the same as this one here https://github.com/smorabit/hdWGCNA/issues/126 I think it might have to do with it. I wonder if this does not work because the metacells function merged Celltypes and Sample ids, therefore SetDatExpr() is not recognizing it?

> head(coexp_hdWGCNA_geneids@misc$Neck$wgcna_metacell_obj$Celltypes_global)
B-Cells#P5866_normal_rep1_1 B-Cells#P5866_normal_rep1_2 B-Cells#P5866_normal_rep1_3 B-Cells#P5866_normal_rep1_4 B-Cells#P5866_normal_rep1_5 
                  "B-Cells"                   "B-Cells"                   "B-Cells"                   "B-Cells"                   "B-Cells" 
B-Cells#P5866_normal_rep1_6 
                  "B-Cells"

I kept on playing with the code and changed some arguments and was able to run MetacellsByGroup. However I would like to understand why it didn't run before and why now I don't get the error anymore?

I am not really able to determine why you got the error before and don't anymore because it is likely dependent on some aspect of the dataset that you are using or the dataset preparation that I have not encountered in any of my use cases.

Closing the issue since you have resolved your error.

smorabit / hdWGCNA

MetacellsByGroups Error + Pipeline questions #128

construct metacells in each group

setup expression matrix