satijalab / seurat-object

https://satijalab.github.io/seurat-object/
Other
24 stars 25 forks source link

SaveSeuratRds(): Is there a way to alter directory address of 'on-disk' matrices in Seurat object #198

Open Dazcam opened 7 months ago

Dazcam commented 7 months ago

Hello,

I have been running an analysis in Seurat 5 that is partially run on a local machine and a remote server, and I've been trying to work out how to change the address of where the BP cell generated count matrices are located. After moving Seurat objects generated remotely to my local machine, I encounter errors when trying to run certain functions as the root directory for the count data is set to the remote directory, rather than the local directory. See here for more details.

The SaveSeuratRds() function looks as if it may be able to change this directory address, but there does not appear to be functionality to just change it without moving the on-disk layers from their original location. I've tried running SaveSeuratRds() with move = F, but when you reload the object the layers are missing:

> seurat_object
An object of class Seurat 
53590 features across 144380 samples within 2 assays 
Active assay: RNA (26795 features, 2000 variable features)
 0 layers present: 
 1 other assay present: sketch
 6 dimensional reductions calculated: pca, umap, harmony, umap.harmony, harmony.full, umap.full

When running SaveSeuratRds() with move = T, obviously the original path is not found (this also occurs when setting relative = T or relative = F).

SaveSeuratRds(seurat_object, paste0(R_dir, '02seurat_', region, '_test.rds'))
Error:
! Can't find path:
...

I've also tried digging around the Seurat Object, but I can't put my finger where this address is stored. The best I could do was find a list of 28 identical items containing the following, but I'm not convinced this is worth changing as it looks like a log:

Matrix_list *** ```r seurat_object@assays$RNA@layers$counts@matrix@matrix@matrix_list[[1]] 26795 x 5706 IterableMatrix object with class RenameDims Row names: TTR, LINC01821 ... PARVG Col names: 10X356_4:GGTGAAGCAGGTGACA, 10X356_4:TGGATGTCACGACAAG ... 10X356_4:AGTGATCAGGCCCAAA Data type: double Storage order: column major Queued Operations: 1. Load compressed matrix from directory /scratch/results/01R_objects/CBL_BP 2. Select rows: 1, 5 ... 59357 and cols: 1, 2 ... 28010 3. Reset dimnames 4. Reset dimnames 5. Reset dimnames 6. Reset dimnames 7. Reset dimnames 8. Reset dimnames 9. Reset dimnames 10. Reset dimnames 11. Reset dimnames ```

So a couple of questions then:

  1. Is there a way to alter the address of the root directory within Seurat, wither using SaveSeuratRds() or otherwise?
  2. If not, could this functionality be added to SaveSeuratRds() to handle local / remote analyses?

Many thanks.

jvelghe commented 4 months ago

Hi Dazcam, here's an example of where the directory address is stored in the Seurat V5 object. Here you can see an example path of 1 of 3 joined datasets in this BPCells Seurat object. It is stored in BP_object@assays[["RNA"]]@layers[["counts"]]@matrix@matrix_list[[1]]@matrix@dir, where 1 represents the first of the joined layers.

You change the store file path for each of the layers like this:

> BP_object@assays[["RNA"]]@layers[["counts"]]@matrix@matrix_list[[1]]@matrix@dir
[1] "/path/to/your/old/dir"
> BP_object@assays[["RNA"]]@layers[["counts"]]@matrix@matrix_list[[1]]@matrix@dir <- "/path/to/your/new/dir"
> BP_object@assays[["RNA"]]@layers[["counts"]]@matrix@matrix_list[[1]]@matrix@dir
[1] "/path/to/your/new/dir"

I'm also curious if you know how to save and then load the saved joined object as a Seurat object again?

Screenshot 2024-07-04 at 2 35 44 AM
Dazcam commented 4 months ago

Hi @jvelghe,

Many Thanks for this. I'll give it a go.

Regarding your question, if I understand your question correctly, I use the following for saving and loading data:

Dazcam commented 4 months ago

@jvelghe The directory name of the BP cells object must be stored in multiple places. After changing the location (as you describe) certain procedures, like trying to convert the 'in memory' matrix back to an 'on disk' matrix, Seurat still reports the old directory.

 seurat_obj[["RNA"]]$counts
#> 27379 x 66782 IterableMatrix object with class RenameDims

#> Row names: ABCA13, PENK-AS1 ... SLC7A7
#> Col names: 10X318_7:GGGTTTAGTTACGATC, 10X318_8:CCCGGAAGTGACTGAG ... 10X145_3:AACAGGGCAGCCGTCA

#> Data type: double
#> Storage order: column major

#> Queued Operations:
#> 1. Concatenate cols of 12 matrix objects with classes: RenameDims, RenameDims ... RenameDims (threads=0)
#> 2. Select rows: 1, 2 ... 27379 and cols: 1, 5345 ... 49485
#> 3. Reset dimnames

> as(object = seurat_obj[["RNA"]]$counts, Class = "dgCMatrix")
#> Error: Missing directory: /scratch/c.cXXXXXX/results/01R_objects/CaB_BP

> seurat_obj@assays[["RNA"]]@layers[["counts"]]@matrix@matrix@matrix_list[[1]]@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@dir
#> [1] "/scratch/c.cXXXXXX/results/01R_objects/CaB_BP"

> seurat_obj@assays[["RNA"]]@layers[["counts"]]@matrix@matrix@matrix_list[[1]]
#> 27379 x 5344 IterableMatrix object with class RenameDims

#> Row names: ABCA13, PENK-AS1 ... SLC7A7
#> Col names: 10X318_7:GGGTTTAGTTACGATC, 10X318_7:TGTGTGAGTTCCGCTT ... 10X318_7:GGGCTCATCCACAGGC

#> Data type: double
#> Storage order: column major

#> Queued Operations:
#> 1. Load compressed matrix from directory /scratch/c.cXXXXXX/results/01R_objects/CaB_BP
#> 2. Select rows: 1, 3 ... 59357 and cols: 1, 3 ... 32673
#> 3. Reset dimnames
#> 4. Reset dimnames
#> 5. Reset dimnames
#> 6. Reset dimnames
#> 7. Reset dimnames
#> 8. Reset dimnames
#> 9. Reset dimnames
#> 10. Reset dimnames
#> 11. Reset dimnames

> seurat_obj@assays[["RNA"]]@layers[["counts"]]@matrix@matrix@matrix_list[[1]]@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@matrix@dir <- '/Users/XXXXXX/Desktop/results/01R_objects/CaB_BP'

> seurat_obj@assays[["RNA"]]@layers[["counts"]]@matrix@matrix@matrix_list[[1]]
#> 27379 x 5344 IterableMatrix object with class RenameDims

#> Row names: ABCA13, PENK-AS1 ... SLC7A7
#> Col names: 10X318_7:GGGTTTAGTTACGATC, 10X318_7:TGTGTGAGTTCCGCTT ... 10X318_7:GGGCTCATCCACAGGC

#> Data type: double
#> Storage order: column major

#> Queued Operations:
#> 1. Load compressed matrix from directory /Users/XXXXXX/Desktop/results/01R_objects/CaB_BP
#> 2. Select rows: 1, 3 ... 59357 and cols: 1, 3 ... 32673
#> 3. Reset dimnames
#> 4. Reset dimnames
#> 5. Reset dimnames
#> 6. Reset dimnames
#> 7. Reset dimnames
#> 8. Reset dimnames
#> 9. Reset dimnames
#> 10. Reset dimnames
#> 11. Reset dimnames

> as(object = seurat_obj[["RNA"]]$counts, Class = "dgCMatrix")
Error: Missing directory: /scratch/c.cXXXXXX/results/01R_objects/CaB_BP