mojaveazure / seurat-disk

Interfaces for HDF5-based Single Cell File Formats
https://mojaveazure.github.io/seurat-disk
GNU General Public License v3.0
147 stars 48 forks source link

Conversion from Seurat to AnnData changes metadata column #23

Open kleurless opened 4 years ago

kleurless commented 4 years ago

Hi!

In Seurat I have a metadata column (merged object) that holds 30 different cell types with a combination of "real" strings ("Neuron") and "integer" strings ("0") . When I Convert to AnnData, that column holds integers from 0-29. I can perform a work-around to create a cell type column after conversion, but it's kind of annoying.

Metadata column in Seurat:

> levels(object@meta.data$orig.celltype)
[1] "0"                   "1"                   "10"                  "11"                  "12"                  "13"                  "2"                  
 [8] "3"                   "4"                   "5"                   "6"                   "7"                   "8"                   "9"                  
[15] "Neuron4"        "Astro1"           "Astro2"       "Ependy"       "MGlia2"                "Neuron1"                "NPC"               
[22] "PC1"                "Olf"                "Oligo"                "OPC"      "PC2"                "Neuron5"                "Neuron3"               
[29] "MGlia1"                "Neuron2" 

Conversion to .h5ad:

SeuratDisk::SaveH5Seurat(object, filename = "path/to/file.h5Seurat")
SeuratDisk::Convert("path/to/file.h5Seurat", dest="h5ad")

Metadata column in Scanpy:

adata = sc.read_h5ad("path/to/file.h5ad")
set(adata.obs["orig.celltype"])
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29}

Scanpy: scanpy==1.4.6 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1 Seurat: SeuratDisk_0.0.0.9013 Seurat_3.2.1

kleurless commented 4 years ago

Related issue to #9 and #11

jychien commented 3 years ago

I had the same issue. Check the R datatype for the metadata column that is having the issue using 'typeof()'. Try changing it to characters using 'as.character()' before running Convert.

colinmcgovern commented 2 years ago

My solution is to convert all of the factor vectors in the meta data over to character vectors, assuming pmbc is a Seurat object:

i <- sapply(pbmc@meta.data, is.factor) pbmc@meta.data[i] <- lapply(pbmc@meta.data[i], as.character)

Kur1sutaru commented 1 year ago

I will try this, thanks a lot