immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
526 stars 100 forks source link

Simple harmony call fails on v1.1.0 #214

Closed const-ae closed 11 months ago

const-ae commented 1 year ago

Hi,

I noticed that the new release of harmony fails for small matrix inputs. The following used to work on version 1.0.3, but now throws an error in harmonyObj$cluster_cpp().

Y <- matrix(rnorm(10 * 2), nrow = 10, ncol = 2)
groups <-  rep(c("a", "b"), length.out = nrow(Y))
int <- harmony::RunHarmony(Y, groups, nclust = 2, verbose = FALSE)
#> Transposing data matrix
#> Error in eval(expr, envir, enclos): Mat::submat(): indices out of bounds or incorrectly used

Created on 2023-10-25 with reprex v2.0.2

Note that the following still works:

Y <- matrix(rnorm(100 * 2), nrow = 100, ncol = 2)
groups <-  rep(c("a", "b"), length.out = nrow(Y))
int <- harmony::RunHarmony(Y, groups, nclust = 2, verbose = FALSE)
#> Transposing data matrix

Created on 2023-10-25 with reprex v2.0.2


Also note that the latest version still ignores the verbose = FALSE here and prints "Transposing data matrix".

Best, Constantin

pati-ni commented 1 year ago

Hi @const-ae ,

Looks like an integer underflow due to the block_size parameter. We will address this and test with your snippet, though many use cases would involve more than 20 cells.

As far as the Transposing data matrix, we will address that also, it is just we did not have a unit test that caught this scenario.

Thank you for your rigorous testing!

xxiann commented 12 months ago

Hi

A similar error came up for me too, not sure if the error message can assist with debugging this issue.

Error: Mat::submat(): indices out of bounds or incorrectly used
9. stop(structure(list(message = "Mat::submat(): indices out of bounds or incorrectly used",
call = NULL, cppstack = NULL), class = c("std::out_of_range",
"C++Error", "error", "condition")))
8. harmonyObj$cluster_cpp()
7. harmonize(harmonyObj, max.iter.harmony, verbose)
6. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatch({
check_legacy_args(...)
if (set.cores) {
prev.ncores.blas <- RhpcBLASctl::blas_get_num_procs() ...
4. RunHarmony.default(data_mat = embedding[, dims.use], meta_data = metavars_df,
vars_use = group.by.vars, return_object = FALSE, ...)
3. RunHarmony(data_mat = embedding[, dims.use], meta_data = metavars_df,
vars_use = group.by.vars, return_object = FALSE, ...)
2. RunHarmony.Seurat(oshu, verbose = F, group.by.vars = c("cohort"),
reduction = "pca", assay.use = "RNA", reduction.save = "harmony")
1. RunHarmony(oshu, verbose = F, group.by.vars = c("cohort"), reduction = "pca",
assay.use = "RNA", reduction.save = "harmony")
pati-ni commented 12 months ago

Can you give us a bit of information for the dataset? How big it is?

xxiann commented 12 months ago

The dataset is 341 samples with 21784 genes, initialised with the following

NFEATURES = 3000
rna.seurat <- CreateSeuratObject(counts = raw_counts, project = "Sample_ID", min.cells = 3, min.features = 200, meta.data = sampleinfo2, assay="RNA")

rna.seurat <- NormalizeData(rna.seurat)
rna.seurat <- FindVariableFeatures(rna.seurat, selection.method = "vst", mean.cutoff = c(0.2, 10), dispersion.cutoff = c(1, 40), nfeatures = NFEATURES)

all.genes <- rownames(rna.seurat)
rna.seurat <- ScaleData(rna.seurat, features = all.genes)

n=30
rna.seurat <- RunPCA(rna.seurat, features = VariableFeatures(object = rna.seurat),
               assay = "RNA",
               npcs = n, verbose = F)

rna.seurat <- RunHarmony(rna.seurat, verbose = F,
                   group.by.vars = c("cohort"), 
                   reduction = "pca", assay.use = "RNA", reduction.save = "harmony")
pati-ni commented 11 months ago

Thank you both for pointing to this bugs.

I would suggest to try out the latest version of the package from github.

Let us know here if there are more issues!