theislab / single-cell-tutorial

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
1.39k stars 459 forks source link

ValueError: Unknown dtype dtype('uint16') cannot be converted to ?gRMatrix. #111

Open nroak opened 1 year ago

nroak commented 1 year ago

I'm encountering error below while saving the AnnData object in the R environment in this step of the tutorial.

> ro.globalenv["adata"] = adata

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[58], line 1
----> 1 ro.globalenv["adata"] = adata

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/rpy2/robjects/environments.py:35, in Environment.__setitem__(self, item, value)
     34 def __setitem__(self, item: str, value: typing.Any) -> None:
---> 35     robj = conversion.get_conversion().py2rpy(value)
     36     super(Environment, self).__setitem__(item, robj)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/anndata2ri/py2r.py:56, in py2rpy_anndata(obj)
     54 # TODO: sparse
     55 x = {} if obj.X is None else dict(X=mat_converter.py2rpy(obj.X.T))
---> 56 layers = {k: mat_converter.py2rpy(v.T) for k, v in obj.layers.items()}
     57 assays = ListVector({**x, **layers})
     59 row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/anndata2ri/py2r.py:56, in <dictcomp>(.0)
     54 # TODO: sparse
     55 x = {} if obj.X is None else dict(X=mat_converter.py2rpy(obj.X.T))
---> 56 layers = {k: mat_converter.py2rpy(v.T) for k, v in obj.layers.items()}
     57 assays = ListVector({**x, **layers})
     59 row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/anndata2ri/scipy2ri/py2r.py:88, in py2r_context.<locals>.wrapper(obj)
     36     importr('Matrix')  # make class available
     37     matrix = SignatureTranslatedAnonymousPackage(
     38         """
     39         sparse_matrix <- function(x, conv_data, dims, ...) {
   (...)
     85         'matrix',
     86     )
---> 88 return f(obj)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/anndata2ri/scipy2ri/py2r.py:97, in csc_to_rmat(csc)
     93 @converter.py2rpy.register(sparse.csc_matrix)
     94 @py2r_context
     95 def csc_to_rmat(csc: sparse.csc_matrix):
     96     csc.sort_indices()
---> 97     conv_data = get_type_conv(csc.dtype)
     98     with localconverter(default_converter + numpy2ri.converter):
     99         return matrix.from_csc(i=csc.indices, p=csc.indptr, x=csc.data, dims=list(csc.shape), conv_data=conv_data)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/anndata2ri/scipy2ri/py2r.py:28, in get_type_conv(dtype)
     26     return base.as_logical
     27 else:
---> 28     raise ValueError(f'Unknown dtype {dtype!r} cannot be converted to ?gRMatrix.')

ValueError: Unknown dtype dtype('uint16') cannot be converted to ?gRMatrix.
LuckyMD commented 1 year ago

Hey!

R does not have unsigned integers. It seems that you have one in your adata object though. Please check adata.obs.dtypes and adata.var.dtypes to see if any of them come up as unsigned integer. Then you can either remove this column or convert it. Then the conversion should work again!

nroak commented 1 year ago

Here's the output for these values. Doesn't look like unsigned integer?

adata.var.dtypes

Accession                  object
Chromosome               category
End                         int64
Start                       int64
Strand                   category
mt                           bool
ribo                         bool
hb                           bool
n_cells_by_counts           int64
mean_counts               float32
log1p_mean_counts         float32
pct_dropout_by_counts     float64
total_counts              float32
log1p_total_counts        float32
n_cells                     int64
dtype: object
adata.obs.dtypes

genotype                      category
replicate                     category
batch                         category
scDblFinder_class                int32
n_genes_by_counts                int32
log1p_n_genes_by_counts        float64
total_counts                   float32
log1p_total_counts             float32
pct_counts_in_top_20_genes     float64
total_counts_mt                float32
log1p_total_counts_mt          float32
pct_counts_mt                  float32
total_counts_ribo              float32
log1p_total_counts_ribo        float32
pct_counts_ribo                float32
total_counts_hb                float32
log1p_total_counts_hb          float32
pct_counts_hb                  float32
outlier                           bool
size_factors                   float64
dtype: object
nroak commented 1 year ago

I re-ran the tutorial from the beginning and getting a different error but at the same step:

if issparse(adata.X):
    if not adata.X.has_sorted_indices:
        adata.X.sort_indices()
ro.globalenv["adata"] = adata

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/rpy2/rinterface_lib/conversion.py:179, in _get_cdata(obj)
    178 try:
--> 179     cdata = obj.__sexp__._cdata
    180 except AttributeError:

AttributeError: 'AnnData' object has no attribute '__sexp__'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[68], line 4
      2     if not adata.X.has_sorted_indices:
      3         adata.X.sort_indices()
----> 4 ro.globalenv["adata"] = adata

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/rpy2/robjects/environments.py:36, in Environment.__setitem__(self, item, value)
     34 def __setitem__(self, item: str, value: typing.Any) -> None:
     35     robj = conversion.get_conversion().py2rpy(value)
---> 36     super(Environment, self).__setitem__(item, robj)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/rpy2/rinterface_lib/sexp.py:404, in SexpEnvironment.__setitem__(self, key, value)
    400 key_cchar = conversion._str_to_cchar(key)
    401 symbol = rmemory.protect(
    402     openrlib.rlib.Rf_install(key_cchar)
    403 )
--> 404 cdata = rmemory.protect(conversion._get_cdata(value))
    405 cdata_copy = rmemory.protect(
    406     openrlib.rlib.Rf_duplicate(cdata)
    407 )
    408 openrlib.rlib.Rf_defineVar(symbol,
    409                            cdata_copy,
    410                            self.__sexp__._cdata)

File ~/opt/anaconda3/envs/velocyto/lib/python3.10/site-packages/rpy2/rinterface_lib/conversion.py:181, in _get_cdata(obj)
    179         cdata = obj.__sexp__._cdata
    180     except AttributeError:
--> 181         raise ValueError('Not an rpy2 R object and unable '
    182                          'to map it to one: %s' % repr(obj))
    183 else:
    184     cdata = cls(obj)

ValueError: Not an rpy2 R object and unable to map it to one: AnnData object with n_obs × n_vars = 23018 × 18216
    obs: 'genotype', 'replicate', 'batch', 'scDblFinder_class', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'size_factors'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells'
    layers: 'counts', 'soupX_counts', 'log1pPF_normalization', 'PFlog1pPF_normalization', 'scran_normalization'

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.10.0 pandas==1.5.3 scikit-learn==1.2.0 statsmodels==0.13.5 python-igraph==0.10.4 louvain==0.8.0 pynndescent==0.5.8

LuckyMD commented 1 year ago

For the first error, you could also check .obsm, .obsp, .varm, .varp and .uns. You also have quite a few layers where unsigned ints may be hiding (not sure)... For the second error, I'm not sure what's going on there. You may be better off reporting this at the anndata2ri repo though as the error is in that step.