theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

write() function error: 'reserved name for dataframe columns' #255

Closed rfenouil closed 3 years ago

rfenouil commented 4 years ago

Hello, I am having an error message when trying to save intermediate results as binary file using adata.write() function. The error message seems to happen only when using the Seurat wrapper found here, not when doing the tutorial with 'pancreas' dataset.

See below for R and Python code to reproduce:

library(Seurat)
library(SeuratDisk)
library(SeuratWrappers)

curl::curl_download(url = 'http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom', destfile = "/data.loom")

ldat <- ReadVelocity(file = "/data.loom")
bm <- as.Seurat(x = ldat) 
bm[["RNA"]] <- bm[["spliced"]]
bm <- SCTransform(bm)
bm <- RunPCA(bm)
bm <- RunUMAP(bm, dims = 1:20)
bm <- FindNeighbors(bm, dims = 1:20)
bm <- FindClusters(bm)
DefaultAssay(bm) <- "RNA"
SaveH5Seurat(bm, filename = "/mouseBM.h5Seurat")
Convert("/mouseBM.h5Seurat", dest = "h5ad")
import scvelo as scv

scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True  # set max width size for presenter view
scv.settings.set_figure_params('scvelo')  # for beautified visualization

adata = scv.read("/mouseBM.h5ad")

adata.write("/mouseBM_processed.h5ad")
Error ``` Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/anndata/_io/utils.py", line 188, in func_wrapper return func(elem, key, val, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 241, in write_dataframe raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.") ValueError: '_index' is a reserved name for dataframe columns. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.7/dist-packages/anndata/_core/anndata.py", line 1852, in write_h5ad as_dense=as_dense, File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 104, in write_h5ad write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs) File "/usr/lib/python3.7/functools.py", line 827, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 126, in write_attribute_h5ad _write_method(type(value))(f, key, value, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 135, in write_raw write_attribute(f, "raw/var", value.var, dataset_kwargs=dataset_kwargs) File "/usr/lib/python3.7/functools.py", line 827, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 126, in write_attribute_h5ad _write_method(type(value))(f, key, value, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/anndata/_io/utils.py", line 195, in func_wrapper ) from e ValueError: '_index' is a reserved name for dataframe columns. Above error raised while writing key 'raw/var' of from /. ```

Versions:

scvelo==0.2.1 scanpy==1.5.1 anndata==0.7.4 loompy==3.0.6 numpy==1.19.0 scipy==1.5.1 matplotlib==3.2.2 sklearn==0.23.1 pandas==1.0.5

Thank you for the great work and your help.

VolkerBergen commented 4 years ago

Running it directly in scvelo works fine

adata = scv.read('data/SCG71.loom',  backup_url='http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom')
adata.write('data/SCG71.h5ad')

Hence, something is included in Seurat, that triggers that error. Could you please print adata and see whether there is any entry named '_index'.

davisidarta commented 3 years ago

Any updates on this? I'm also having this issue using saving .h5ad files from .h5ad files created using SeuratDisk, exclusively after running scv.pp.moments(adata). The same error does not happen when saving the same .h5ad file after performing additional analysis on scanpy - only after calculating moments within scvelo.

mihem commented 3 years ago

I'am also having this same issue, running adata.write(filename = "scvelo.h5ad")

For me it also doesn't work before running scvelo.

adata = scv.read("SeuratObject.h5ad") adata.write(filename = "scvelo.h5ad")

raises:

ValueError: '_index' is a reserved name for dataframe columns.

While it works fine with the dataset that VolkerBergen suggested.

@VolkerBergen could you maybe specify where you would expect the "_index" entry to be?

my AnnData object looks like this in the summary

obs: 'orig.ident', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_ambiguous', 'nFeature_ambiguous', 'nCount_RNA', 'nFeature_RNA', 'library', 'tissue', 'percent_mt', 'seurat_clusters', 'spliced_snn_res.0.3', 'label_new'
    var: 'features', 'ambiguous_features', 'spliced_features', 'unspliced_features'
    obsm: 'X_umap'
    layers: 'ambiguous', 'spliced', 'unspliced'
mariafiruleva commented 3 years ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})
zehualilab commented 3 years ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

OMG!!!!OMG!!!!!OMG!!!!OMG!!!!!PROBLEM SOLVED!!!!!!!PROBLEM SOLVED!!!!!!!THX!!!!!!THX!!!!!!!!!!!

genecell commented 3 years ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

This works for me for saving the anndata h5ad file, but I got the following message when I plot the dotplot:

f"Could not find keys '{not_found}' in columns of `adata.{dim}` or in"
KeyError: "Could not find keys '['AC004791.2', 'ALKBH5', 'APOBEC3A', 'ATHL1', 'BANK1', 'BCL9L', 'BST1', 'C1QA', 'C1QC', 'C1QTNF4', 'CALB2', 'CCR8', 'CD1C', 'CD8B', 'CDK15', 'CLEC10A', 'CMTM8', 'CXCL13', 'CYB561', 'DERL3', 'EOMES', 'FCER1A', 'FCGR3A', 'FGFBP2', 'FOXP3', 'FSCN1', 'GALNT2', 'GNG4', 'GZMK', 'HOXC6', 'HSPA6', 'IDO1', 'IFIT1', 'IFIT3', 'IGFL2', 'IGHG4', 'IL1B', 'IL1RN', 'IL7R', 'KLRF1', 'KRT5', 'KRT86', 'LAD1', 'LEF1', 'LINC00926', 'METRNL', 'MKI67', 'MS4A1', 'MTRNR2L8', 'MZB1', 'NR4A2', 'P2RY6', 'PASK', 'PEMT', 'PTGS2', 'PTPN13', 'PTPRS', 'RNASE1', 'ROR1.AS1', 'RP11.138A9.1', 'RP11.354E11.2', 'RP11.89C3.4', 'RPL34', 'RPL36A', 'RRM2', 'RSAD2', 'RTKN2', 'TLDC2', 'TLR8', 'TOR4A', 'TUBA4A', 'UBE2C', 'ZNF331']' in columns of `adata.obs` or in adata.raw.var_names."

I tried to delete the adata.raw:

del adata.raw

and now I can save the anndata file, and also it works for the dotplot function.

paulitikka commented 2 years ago

If someone is still experiencing an issue with this saving execute also the following: del(adata.var['_index']) #after the 'adata.dict['_raw'].dict['_var'] = adata.dict['_raw'].dict['_var'].rename(columns={'_index': 'features'}); del(adata.raw)' solution

YY-SONG0718 commented 2 years ago

del(adata.var['_index'])

recently I encounter this error again after using the original solution for a while, this solved the issue, thanks!

paulitikka commented 2 years ago

You are welcome Yuyao!

Mayank0512 commented 1 year ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

Damn man that works....thanks so much....u are a true savior!!! Thank youuuuuuuuu again

weir12 commented 1 year ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

BRAVO ! !!

maximilianh commented 1 year ago

Oh boy, @mariafiruleva so many thanks!!

This command is a little easier to read, for me at least, and seems to do the same thing:

adata._raw._var.rename(columns={'_index': 'features'}, inplace=True)

Tianran1998 commented 9 months ago

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

It works!Thank you very much!

GiveHeartToU commented 2 months ago

The issue was indeed caused by the h5ad data generated by SeuratDisk. SeuratDisk may have created a column named _index in adata.raw.var during the data conversion process, which led to a conflict when saving the data.

Sort out the solutions given by the predecessors, the following three methods can be used to solve the problem:

1. Rename the _index column in adata.raw.var

This will rename the _index column in adata.raw.var to features or another non-conflicting name.

adata._raw._var.rename(columns={'_index': 'features'}, inplace=True)

2. Directly operate on the underlying dictionary

Rename the _index column by operating on the underlying dictionary of the adata object.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

3. Delete adata.raw

If you don't need the original data in adata.raw, you can directly delete adata.raw.

adata.raw = None