smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

Error in get.knn(data, k, algorithm) : Data non-numeric #133

Closed Thapeachydude closed 9 months ago

Thapeachydude commented 10 months ago

Hello,

I'm getting an error during the meta cell grouping.

Error in get.knn(data, k, algorithm) : Data non-numeric

The seurat object metadata looks like this:

                           orig.ident nCount_RNA nFeature_RNA  celltype
ULYXJU_CCCAATCCACTATCTT-1       ULYXJU   3149.629         2166   nk-cell
USHAQS_TAGTGGTAGTGCAAGC-1       USHAQS   2831.248         1409    t-cell
USHAQS_AAATGCCCAGTATAAG-1       USHAQS   3234.798         2161    t-cell
UVGXVH_GCTCTGTGTGCACTTA-1       UVGXVH   2966.807         2069   nk-cell
USHAQS_GTTACAGAGAAGGGTA-1       USHAQS   3033.306         1842    t-cell
UOLJMN_GATGAGGTCGGAAATA-1       UOLJMN   2390.138         1009 dead_cell
UQPQIX_GCGCAGTGTCGGCACT-1       UQPQIX   3290.551         2336    t-cell
ULYXJU_AGGCCGTGTTCGGGCT-1       ULYXJU   3103.164         2002    t-cell
ULYXJU_GATCGCGTCAGCTCTC-1       ULYXJU   2766.685         1353    t-cell

The command I run for the metacell grouping is:

opt <- list()
opt$k <- 25
opt$metaover <- 10
groupinVar <- "celltype"

MetacellsByGroups(seurat_obj = sce.to.seurat, group.by = groupingVar, by
                                     reduction = "PCA", 
                                     k = opt$k, 
                                     max_shared = opt$metaover, 
                                     ident.group = groupingVar, 
                                     wgcna_name = "hdwgcna")

Oddly enough this error doesn't appear when I group e.g. by "orig.ident".

Side note: I'm not too familiar with seurat objects, all pre-processing is done on a single-cell experiment object, using approaches described in OSCA. The object is then converted to a seurat object, before running hdWGCNA.

Any insights into why this error appears would be much appreciated, many thanks : ) M

smorabit commented 10 months ago

Hi,

Thanks for your interest in hdWGCNA and taking the time to write this issue. I am on holiday until early August so please bear with me if I am not able to help you resolve the issue until then.

I have not personally tried running hdWGCNA on a Seurat object that has been converted from a single-cell experiment. My initial thought is that if it's working on your orig.ident column but not your other grouping variable, there must be something wrong with the data type or there could maybe be missing entries. You can find the grouping variable in the meta.data slot of the seurat object like this:

seurat_obj@meta.data

Thapeachydude commented 10 months ago

Hello and thanks for the quick reply!

No worries, I'm on vacation myself next week. I tried playing around a bit yesterday, nonetheless. I seem to have solved it by finding a different way of passing celltype to the metadata column. But I'm still not sure why the previous one crashed. Magic?

In both cases it was a column in seurat_obj@meta.data, the output of which could be seen in the post above.

Anyway, I'm running it now and have to take a look at the output once I'm back. But it seems it doesn't crash now. So perhapse its solved...

Thapeachydude commented 8 months ago

After encountering the issue again, I spent a bit of time on it and found the cause... Reporting in case someone else bumps into the same problem.

During the creation of the seurat object, one needs to make sure that count matrix colnames and dimensional reduction rownames are identical. E.g. in my case the count matrix colnames were cell barcodes, but PCA rownames were simple integers. This will not throw an error during the creation of the object, but lead to the loss of PCA dimensions when subsetting the seurat object later on.

This specific error is caused by the nn_map <- FNN::knn.index(reduced_coordinates, k = (k - 1)) step in ConstructMetacells, which is then called on a 0 x 0 dataframe after the seurat object is split by groups.

cstrlln commented 2 months ago

After encountering the issue again, I spent a bit of time on it and found the cause... Reporting in case someone else bumps into the same problem.

During the creation of the seurat object, one needs to make sure that count matrix colnames and dimensional reduction rownames are identical. E.g. in my case the count matrix colnames were cell barcodes, but PCA rownames were simple integers. This will not throw an error during the creation of the object, but lead to the loss of PCA dimensions when subsetting the seurat object later on.

This specific error is caused by the nn_map <- FNN::knn.index(reduced_coordinates, k = (k - 1)) step in ConstructMetacells, which is then called on a 0 x 0 dataframe after the seurat object is split by groups.

Thank you for this. I have my data in sce objects and was wondering about the seurat transformation. Would you mind sharing your suggested way of converting the objects to not have this error.

Thapeachydude commented 2 months ago

Hi you would want to create the seurat objection something along these lines:

sce.x ## Your SCE object

colnames(sce.x) <- sce.x$uniqueBarcode ## Make sure barcodes are stored are also column names

## Get raw count matrix
count_mat.raw <- counts(sce.x) %>% as.matrix() # get the raw counts matrix
colnames(count_mat.raw) <- sce.x$uniqueBarcode # make sure to keep barcodes

## Get normalized count matrix
count_mat <- logcounts(sce.x) %>% as.matrix() # get the log counts matrix
colnames(count_mat) <- sce.x$uniqueBarcode # make sure to keep barcodes

## Create seurat object
sce.to.seurat <- CreateSeuratObject(counts = count_mat.raw, assay = "RNA") # create seurat object

## Assign normalized counts
sce.to.seurat[["RNA"]]@data <- count_mat

## Define highly variable genes 
set.seed(100)
dec <- modelGeneVar(sce.x, block = sce.x$Batch) # subset to most differentially expressed genes
hvgenes <- getTopHVGs(dec)

## Assign to seurat object
VariableFeatures(sce.to.seurat) <- hvgenes

## Add dimensional reduction
sce.to.seurat[["PCA"]] <- CreateDimReducObject(embeddings = as.matrix(reducedDim(sce.x, "PCA")),
                                               key = "PCA", assay = "RNA")
cstrlln commented 2 months ago

Hi you would want to create the seurat objection something along these lines:

sce.x ## Your SCE object

colnames(sce.x) <- sce.x$uniqueBarcode ## Make sure barcodes are stored are also column names

## Get raw count matrix
count_mat.raw <- counts(sce.x) %>% as.matrix() # get the raw counts matrix
colnames(count_mat.raw) <- sce.x$uniqueBarcode # make sure to keep barcodes

## Get normalized count matrix
count_mat <- logcounts(sce.x) %>% as.matrix() # get the log counts matrix
colnames(count_mat) <- sce.x$uniqueBarcode # make sure to keep barcodes

## Create seurat object
sce.to.seurat <- CreateSeuratObject(counts = count_mat.raw, assay = "RNA") # create seurat object

## Assign normalized counts
sce.to.seurat[["RNA"]]@data <- count_mat

## Define highly variable genes 
set.seed(100)
dec <- modelGeneVar(sce.x, block = sce.x$Batch) # subset to most differentially expressed genes
hvgenes <- getTopHVGs(dec)

## Assign to seurat object
VariableFeatures(sce.to.seurat) <- hvgenes

## Add dimensional reduction
sce.to.seurat[["PCA"]] <- CreateDimReducObject(embeddings = as.matrix(reducedDim(sce.x, "PCA")),
                                               key = "PCA", assay = "RNA")

Thank you!