mojaveazure / seurat-disk

Interfaces for HDF5-based Single Cell File Formats
https://mojaveazure.github.io/seurat-disk
GNU General Public License v3.0
142 stars 48 forks source link

Error in Initializing RNA with data #29

Open paupuigdevall opened 3 years ago

paupuigdevall commented 3 years ago

Hi.

This is probably not directly related to SeuratDisk function, but when attempting to read a large .h5ad converted to .h5seurat by Convert(), the loading crashed in LoadH5Seurat(). The dataset I pretend to load is huge (>1M cells). Is it possible that R and consequently Seurat is limited to a matrix size that does not allow to import such files? Do you know any workaround to this error?

## My code library(Seurat) library(SeuratDisk) library(hdf5r) file=paste0(pathToDir,"allpools.scanpy.init.h5ad") fileSeurat=paste0(pathToDir,"allpools.scanpy.init.h5seurat") Convert(file, dest = fileSeurat, overwrite = TRUE) test <- LoadH5Seurat(fileSeurat)

This is the error message I obtained: Registered S3 method overwritten by 'cli': method from
print.boxx spatstat Registered S3 method overwritten by 'SeuratDisk': method from
as.sparse.H5Group Seurat Warning: Unknown file type: h5ad Warning: 'assay' not set, setting to 'RNA' Creating h5Seurat file for version 3.1.5.9900 Adding X as data Adding X as counts Adding meta.features from var Validating h5Seurat file Initializing RNA with data Error in if ((lp <- length(p)) < 1 || p[1] != 0 || any((dp <- p[-1] - : missing value where TRUE/FALSE needed Calls: LoadH5Seurat ... as.matrix.H5Group -> as.sparse -> as.sparse.H5Group -> sparseMatrix In addition: Warning message: In sparseMatrix(i = x[["indices"]][] + 1, p = x[["indptr"]][], x = x[["data"]][], : NAs introduced by coercion to integer range Execution halted

Thanks in advance for your help.

danielruss commented 3 years ago

was this ever resolved? I have it too.
oddly when I create the matrix by hand it works:

hfile <- Connect("~/Downloads/allexpsctl.h5seurat")
x<-hfile[["assays/RNA/data"]]
sp<-sparseMatrix(i=x[["indices"]][]+1,p=x[["indptr"]][],x=x[["data"]][]  )
sp[1:3,1:3]
3 x 3 sparse Matrix of class "dgCMatrix"

[1,] 2.662056 0.668771 .
[2,] 3.856432 .        .
[3,] 2.741403 .        .

I get a sparse matrix, but

obj <- LoadH5Seurat("~/Downloads/allexpsctl.h5seurat",assays="RNA")
Validating h5Seurat file
Initializing RNA with data
Error in sparseMatrix(i = x[["indices"]][] + 1, p = x[["indptr"]][], x = x[["data"]][],  : 
  all(dims >= dims.min) is not TRUE

BTW: My h5seurat file was created by converting an h5ad file. Convert("~/Downloads/allexpsctl.h5ad",dest="h5seurat",overwrite=TRUE,verbose=TRUE)

danielruss commented 3 years ago

This may be part of the problem. When I add the "dims" argument to the sparseMatrix function (which I previously left off), the function worked. However if you look at what the h5 file thinks the dims are, you see it is transposed. As it turns out, the spare matrix data in the h5 file is transposed, but the dimension h5attr(x = x, which = "dims") are not transpose leading to shape problems.

> hfile <- Connect("~/Downloads/allexpsctl.h5seurat")
Validating h5Seurat file
> x<-hfile[["assays/RNA/data"]]
> sp<-sparseMatrix(i=x[["indices"]][]+1,p=x[["indptr"]][],x=x[["data"]][]  )
> dim(sp)
[1] 43890 25069
> h5attr(x = x, which = "dims")
[1] 25069 43890
pormr commented 2 years ago

I managed to load the dataset mentioned by @danielruss after transposing the data.

Convert("./matrices/allexpsctl.h5ad", "H5Seurat", overwrite = TRUE)

obj_HDF5 <- Connect("./matrices/allexpsctl.h5seurat", mode = "r+")

Transpose(obj_HDF5[["assays/RNA/counts"]], overwrite = TRUE)
Transpose(obj_HDF5[["assays/RNA/data"]], overwrite = TRUE)
obj_HDF5$link_delete("assays/RNA/counts")
obj_HDF5$link_delete("assays/RNA/data")
obj_HDF5$link_move_from(obj_HDF5, "assays/RNA/t_counts", "assays/RNA/counts")
obj_HDF5$link_move_from(obj_HDF5, "assays/RNA/t_data", "assays/RNA/data")
old_dims <- hdf5r::h5attr(obj_HDF5[["assays/RNA/data"]], "dims")
new_dims <- rev(old_dims)
hdf5r::h5attr(obj_HDF5[["assays/RNA/counts"]], "dims") <- new_dims
hdf5r::h5attr(obj_HDF5[["assays/RNA/data"]], "dims") <- new_dims
obj_HDF5$close()
JesusGF1 commented 1 year ago

How long did it take to perform the transpose? I am facing a simmilar problem working with the Allen mouse brain atlas dataset.