mojaveazure / seurat-disk

Interfaces for HDF5-based Single Cell File Formats
https://mojaveazure.github.io/seurat-disk
GNU General Public License v3.0
153 stars 49 forks source link

Conversion of h5Seurat object to h5ad not capturing RNA assay after integration #75

Open hsmaan opened 3 years ago

hsmaan commented 3 years ago

Hi, thanks for developing this very useful library.

I have a task where I'm looking to perform integration in Seurat, save that as an h5Seurat file and convert it to h5ad to load in python using Scanpy, as per the interoperability vignette (https://mojaveazure.github.io/seurat-disk/articles/convert-anndata.html).

An issue I'm having is that the conversion step (Convert("[].h5Seurat", dest = "h5ad")), only saves the "integrated" assay (with integration feature subsetting) and not the original "RNA" assay of the data. This can be reproduced using the integration vignette of immune cell data (https://satijalab.org/seurat/articles/integration_introduction.html) for Seurat 4, and the interoperability vignette:

R Code:

library(Seurat)
library(SeuratData)
library(SeuratDisk)

InstallData("pbmc3k")
data("pbmc3k.final")
LoadData("ifnb")

ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)

SaveH5Seurat(immune.combined, filename = "combined.h5Seurat")
Convert("combined.h5Seurat", dest = "h5ad")

R code - conversion output

> SaveH5Seurat(immune.combined, filename = "combined.h5Seurat")
Creating h5Seurat file for version 3.1.5.9900
Adding counts for RNA
Adding data for RNA
No variable features found for RNA
No feature-level metadata found for RNA
Adding data for integrated
Adding variable features for integrated
No feature-level metadata found for integrated
> Convert("combined.h5Seurat", dest = "h5ad")
Validating h5Seurat file
Adding data from integrated as X
Transfering meta.data to obs

Python Code

import anndata as ann 
import scanpy as sc 

immune_sc = sc.read_h5ad("combined.h5ad")
immune_sc 

Python output

>>> immune_sc
AnnData object with n_obs × n_vars = 13999 × 2000
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations'
    var: 'features'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

From here, only the "integrated" assay is being transferred to the h5ad object during conversion from h5Seurat - I was wondering if it would be possible to also add the "RNA" assay information? Raw RNA counts from non-integrated Seurat objects are captured in the "raw" attribute of h5ad during conversion - could something similar be added for the RNA assay in the integration case as well? Thank you!

R Session information:

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ifnb.SeuratData_3.1.0   patchwork_1.1.1         SeuratDisk_0.0.0.9019  
[4] pbmc3k.SeuratData_3.1.4 SeuratData_0.2.1        SeuratObject_4.0.0     
[7] Seurat_4.0.1           

Relevant python package versions:

python = 3.8.5
scanpy = 1.6.1
anndata = 0.7.6
mcap91 commented 2 years ago

Any info on this? I have the same issue but it is doing it with the original RNA assay (not using an integrated assay). The Seuratdisk pipeline only transfers the 2000 variable genes to the .h5ad file instead of all of the features in my object.

cdiaz45 commented 2 years ago

Hello, I had the same. The way around it I found was to convert the object after the Normalise step, before the Find Variable genes step.

lquin003 commented 2 years ago

I'm having a similar issue. Did anyone resolve this? I've tried the following:

In R obj is an integrated object consisting of 80 samples

obj = readRDS(paste0(directory,"obj.RDS"))

DefaultAssay(obj) = "RNA"

SaveH5Seurat(object = obj, filename = "obj_rna.h5seurat", verbose = T)

Convert("obj_rna.h5seurat", dest="h5ad", assay = "RNA", overwrite = T) In Jupyter Notebooks

data = anndata.read_h5ad(processed_file) data

Output AnnData object with n_obs × n_vars = 1755172 × 2000

It seems to only have the 2k variable genes from seurat that were used to integrate within jupyter notebooks.

It would be ideal if I didn't have to start from the initial unintegrated seurat objects when trying to use scanpy.

UPDATE

Seurat disk was working properly however it was using "scale.data" as default which had the integrated variables. The solution I found was to delete the "scale.data" slot using the dietseurat() function which would then make seurat disk use the "data" slot by default. This github thread also helped me figure this out if anyone encounters a similar issue in the future. https://github.com/mojaveazure/seurat-disk/issues/21

DM0815 commented 1 year ago

I'm having a similar issue. Did anyone resolve this? I've tried the following:

In R obj is an integrated object consisting of 80 samples

obj = readRDS(paste0(directory,"obj.RDS"))

DefaultAssay(obj) = "RNA"

SaveH5Seurat(object = obj, filename = "obj_rna.h5seurat", verbose = T)

Convert("obj_rna.h5seurat", dest="h5ad", assay = "RNA", overwrite = T) In Jupyter Notebooks

data = anndata.read_h5ad(processed_file) data

Output AnnData object with n_obs × n_vars = 1755172 × 2000

It seems to only have the 2k variable genes from seurat that were used to integrate within jupyter notebooks.

It would be ideal if I didn't have to start from the initial unintegrated seurat objects when trying to use with scanpy.

UPDATE

Seurat disk was working properly however it was using "scale.data" as default which had the integrated variables. The solution I found was to delete the "scale.data" slot using the dietseurat() function which would then make seurat disk use the "data" slot by default. This github thread also helped me figure this out if anyone encounters a similar issue in the future. #21

Excuse me , when I follow your description and the result remains the same 'new_seurat <- DietSeurat(s.integrated, scale.data = FALSE)'. It outputs still 2000 var genes, rather than all genes, Can you give me some hints?

lquin003 commented 1 year ago

Try the following:

remove scale data slot we want the total number of dimensions to be the 32K genes and not the scaled 2K genes

obj = DietSeurat(object = obj, counts = T, data = T, scale.data = F, assays = "RNA", dimreducs =c('pca','tsne','umap','harmony'))

Confirm the 32K genes as the number of rows

dim(obj)

Make sure that your assay matches your Seurat object. You might have to change your assay to "integrated" if you didn't integrate using harmony.

DM0815 commented 1 year ago

Try the following: #remove scale data slot we want the total number of dimensions to be the 32K genes and not the scaled 2K genes obj = DietSeurat(object = obj, counts = T, data = T, scale.data = F, assays = "RNA", dimreducs =c('pca','tsne','umap','harmony'))

Confirm the 32K genes as the number of rows dim(obj)

Make sure that your assay matches your Seurat object. You might have to change your assay to "integrated" if you didn't integrate using harmony. I used CCA to remove batch effects. And I found that the data and scale.data of intergrated in my seurarobject are all 2k genes*xxx, I have no idea, just use assays$RNA@data to replace assays$integrated@data, but lost much information

kvegesan-stjude commented 5 months ago

Try the following: #remove scale data slot we want the total number of dimensions to be the 32K genes and not the scaled 2K genes obj = DietSeurat(object = obj, counts = T, data = T, scale.data = F, assays = "RNA", dimreducs =c('pca','tsne','umap','harmony'))

Confirm the 32K genes as the number of rows dim(obj)

Make sure that your assay matches your Seurat object. You might have to change your assay to "integrated" if you didn't integrate using harmony.

This is the only solution that worked for me. Thank you @lquin003

theheking commented 1 day ago

The method above with DietSeurat did not work for me—instead two lines to remove both highly variable genes and scaled.data.

VariableFeatures(pbmc_analysis) <- NULL
pbmc_analysis[["RNA"]]$scale.data <- NULL