satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 914 forks source link

issue with using sketch assay after BPcells (seurat v5) #9300

Closed fingeram closed 3 weeks ago

fingeram commented 1 month ago

Hi,

I am working with a very larger sc/sn RNA-Seq dataset. Starting from an h5ad file have used BPcells package to load data in-memory as follows:

`raw <- open_matrix_anndata_hdf5(path="/novo/projects/departments/compbio/sysbio/Projects/mouse_liver_models/single_cell_and_nuclei/concatenated.dir/concatenated.h5ad") #imports as data type float

raw <- convert_matrix_type(raw, type = "uint32_t") #must convert count matrix from type float (non-integer) to integer values

write_matrix_dir(mat = raw, dir = "/novo/projects/shared_projects/liver_biology_colab/people/aqnf/mouse_sc_sn_AQNF_June24/BPcells/mouse_counts")

raw.mat <- open_matrix_dir(dir = "/novo/projects/shared_projects/liver_biology_colab/people/aqnf/mouse_sc_sn_AQNF_June24/BPcells/mouse_counts")

sobj <- CreateSeuratObject(counts = raw.mat)

meta <- merge(x= metadata_BSCK, y= metadata_CPDM, by.x = "LibraryID", by.y = "library_id", all.y=T)

sobj<- AddMetaData(sobj, metadata = meta)`

I am working with seurat v5, so I am trying to split layers based on the perepartion method (single cell and single nuc seq). After that I am creating a sketch assay for my seurat object in-memory in order to run downstream analysis more efficiently (the dataset is to large for the available memory):

`sobj <- subset(sobj, subset = nCount_RNA < 50000 & nFeature_RNA > 250 & nFeature_RNA < 8000 & pct_ribo < 20)

sobj[["RNA"]] <- split(sobj[["RNA"]], f = sobj$group)

sobj <- NormalizeData(sobj)

sobj <- FindVariableFeatures(sobj)

sobj.sketch <- SketchData( object = sobj, ncells = 50000, method = "LeverageScore", sketched.assay = "sketch")

DefaultAssay(sobj.sketch) <- "sketch"`

Up to that point everything runs fine but then when I try to get started with the dimensionality reduction I am running into issues that I don't understand. It seems like something goes wrong when trying to RunPCA, as the Ellbow plot looks very weird and other steps of the pipeline relying on the pca, fail to run. I tried to trace the issue but have failed, so help is very welcome:

`sobj.sketch <- FindVariableFeatures(sobj.sketch)

sobj.sketch <- ScaleData(sobj.sketch)

sobj.sketch <- RunPCA(sobj.sketch)

sobj.sketch <- FindNeighbors(sobj.sketch, dims = 1:30)

Computing nearest neighbor graph Computing SNN Error: std::bad_alloc`

file_show (1)

fingeram commented 1 month ago

Some additional info about my data objects:

> raw.mat 33696 x 1328118 IterableMatrix object with class MatrixDir

Row names: Xkr4, Gm1992 ... ENSMUSG00000095041 Col names: AAACCCAAGCCTGAGA-97, AAACCCAGTCGTACAT-97 ... TTTGTTGTCTGCATGA-96

Data type: uint32_t Storage order: column major

Queued Operations:

  1. Load compressed matrix from directory /novo/projects/shared_projects/liver_biology_colab/people/aqnf/mouse_sc_sn_AQNF_June24/BPcells/mouse_counts

> sobj An object of class Seurat 33696 features across 1191094 samples within 1 assay Active assay: RNA (33696 features, 2000 variable features) 4 layers present: counts.SC, counts.SN, data.SC, data.SN

> sobj.sketch An object of class Seurat 67392 features across 1191094 samples within 2 assays Active assay: sketch (33696 features, 2000 variable features) 5 layers present: counts.SC, counts.SN, data.SC, data.SN, scale.data 1 other assay present: RNA 1 dimensional reduction calculated: pca

> sobj.sketch@assays$sketch Assay (v5) data with 33696 features for 1e+05 cells Top 10 variable features: Mmp12, Igfbp5, Igkc, Nxph1, Kcnip4, Ighm, Grm8, Nrg1, Jchain, Siglech Layers: counts.SC, counts.SN, data.SC, data.SN, scale.data

longmanz commented 3 weeks ago

closing this issue since it is duplicated with https://github.com/satijalab/seurat/issues/9301