zakieh-tayyebi / CellSpace

Scalable sequence-informed embedding of single-cell ATAC-seq data with CellSpace
MIT License
34 stars 5 forks source link

Issue dealing with matrices for larger datasets - tile.mtx #7

Open wgao688 opened 4 months ago

wgao688 commented 4 months ago

Hi, thanks for developing this software!

I am analyzing a relatively large dataset (~200 K cells) coming from multiple batches so there is a strong batch effect that I am hoping to use CellSpace to correct.

However, when I run tile.mtx <- assays(getMatrixFromProject(archr.obj, useMatrix = "TileMatrix", binarize = T))$TileMatrix , I run in the R error Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'assays': p[length(p)] cannot exceed 2^31-1 related to the size of the object.

I see in your paper that you analyzed a ~700 K cell atlas using a subsampling approach. Could you provide code (for example the code for the whole 700K atlas results) and more information about how to do this? For example, if I should subset, I would want some way of capturing the heterogeneity rather than just a random subset, and also I'm not sure how to integrate this into the rest of the CellSpace framework. Thanks for your help.