rpy2 / rpy2-Matrix

Mapping of the R package Matrix for rpy2
2 stars 0 forks source link

Sparse data frame conversion fails #2

Open krassowski opened 2 years ago

krassowski commented 2 years ago

Is your feature request related to a problem? Please describe.

from scipy import sparse
mat = sparse.eye(3)
df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])
%R -i df
pandas2ri.py: Error while trying to convert the column "A". Fall back to string conversion. The error is: 'SparseDtype' object has no attribute 'isnative'
```python AttributeError Traceback (most recent call last) /lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:57, in py2rpy_pandasdataframe(obj) 56 try: ---> 57 od[name] = conversion.py2rpy(values) 58 except Exception as e: File ~/.pyenv/versions/3.9.5/lib/python3.9/functools.py:877, in singledispatch..wrapper(*args, **kw) 874 raise TypeError(f'{funcname} requires at least ' 875 '1 positional argument') --> 877 return dispatch(args[0].__class__)(*args, **kw) /lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:191, in py2rpy_pandasseries(obj) 189 # current conversion as performed by numpy --> 191 res = func(obj.values) 192 if len(obj.shape) == 1: /lib/python3.9/site-packages/rpy2/robjects/numpy2ri.py:84, in numpy2rpy(o) 82 """ Augmented conversion function, converting numpy arrays into 83 rpy2.rinterface-level R structures. """ ---> 84 if not o.dtype.isnative: 85 raise ValueError('Cannot pass numpy arrays with non-native ' 86 'byte orders at the moment.') AttributeError: 'SparseDtype' object has no attribute 'isnative' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:610, in SexpVector.from_object(cls, obj) 609 try: --> 610 mv = memoryview(obj) 611 res = cls.from_memoryview(mv) TypeError: memoryview: a bytes-like object is required, not 'Series' During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) Input In [280], in () 2 mat = sparse.eye(3) 3 df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C']) ----> 4 get_ipython().run_line_magic('R', '-i df') /lib/python3.9/site-packages/IPython/core/interactiveshell.py:2305, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth) 2303 kwargs['local_ns'] = self.get_local_scope(stack_depth) 2304 with self.builtin_trap: -> 2305 result = fn(*args, **kwargs) 2306 return result /lib/python3.9/site-packages/rpy2/ipython/rmagic.py:737, in RMagics.R(self, line, cell, local_ns) 735 raise NameError("name '%s' is not defined" % input) 736 with localconverter(converter) as cv: --> 737 ro.r.assign(input, val) 739 if args.display: 740 try: /lib/python3.9/site-packages/rpy2/robjects/functions.py:198, in SignatureTranslatedFunction.__call__(self, *args, **kwargs) 196 v = kwargs.pop(k) 197 kwargs[r_k] = v --> 198 return (super(SignatureTranslatedFunction, self) 199 .__call__(*args, **kwargs)) /lib/python3.9/site-packages/rpy2/robjects/functions.py:117, in Function.__call__(self, *args, **kwargs) 116 def __call__(self, *args, **kwargs): --> 117 new_args = [conversion.py2rpy(a) for a in args] 118 new_kwargs = {} 119 for k, v in kwargs.items(): 120 # TODO: shouldn't this be handled by the conversion itself ? /lib/python3.9/site-packages/rpy2/robjects/functions.py:117, in (.0) 116 def __call__(self, *args, **kwargs): --> 117 new_args = [conversion.py2rpy(a) for a in args] 118 new_kwargs = {} 119 for k, v in kwargs.items(): 120 # TODO: shouldn't this be handled by the conversion itself ? File ~/.pyenv/versions/3.9.5/lib/python3.9/functools.py:877, in singledispatch..wrapper(*args, **kw) 873 if not args: 874 raise TypeError(f'{funcname} requires at least ' 875 '1 positional argument') --> 877 return dispatch(args[0].__class__)(*args, **kw) /lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:63, in py2rpy_pandasdataframe(obj) 58 except Exception as e: 59 warnings.warn('Error while trying to convert ' 60 'the column "%s". Fall back to string conversion. ' 61 'The error is: %s' 62 % (name, str(e))) ---> 63 od[name] = StrVector(values) 65 return DataFrame(od) /lib/python3.9/site-packages/rpy2/robjects/vectors.py:385, in StrVector.__init__(self, obj) 384 def __init__(self, obj): --> 385 super().__init__(obj) 386 self._add_rops() /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:523, in SexpVector.__init__(self, obj) 521 super().__init__(obj) 522 elif isinstance(obj, collections.abc.Sized): --> 523 super().__init__(self.from_object(obj).__sexp__) 524 else: 525 raise TypeError('The constructor must be called ' 526 'with an instance of ' 527 'rpy2.rinterface.Sexp ' 528 'or an instance of ' 529 'rpy2.rinterface._rinterface.SexpCapsule') /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:614, in SexpVector.from_object(cls, obj) 612 except (TypeError, ValueError): 613 try: --> 614 res = cls.from_iterable(obj) 615 except ValueError: 616 msg = ('The class methods from_memoryview() and ' 617 'from_iterable() both failed to make a {} ' 618 'from an object of class {}' 619 .format(cls, type(obj))) /lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:45, in _cdata_res_to_rinterface.._(*args, **kwargs) 44 def _(*args, **kwargs): ---> 45 cdata = function(*args, **kwargs) 46 # TODO: test cdata is of the expected CType 47 return _cdata_to_rinterface(cdata) /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:552, in SexpVector.from_iterable(cls, iterable, populate_func, set_elt, cast_value) 547 with memorymanagement.rmemory() as rmemory: 548 r_vector = rmemory.protect( 549 openrlib.rlib.Rf_allocVector( 550 cls._R_TYPE, n) 551 ) --> 552 populate_func(iterable, r_vector, set_elt, cast_value) 553 return r_vector /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:474, in _populate_r_vector(iterable, r_vector, set_elt, cast_value) 472 def _populate_r_vector(iterable, r_vector, set_elt, cast_value): 473 for i, v in enumerate(iterable): --> 474 set_elt(r_vector, i, cast_value(v)) /lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:677, in _as_charsxp_cdata(x) 675 return x.__sexp__._cdata 676 else: --> 677 return conversion._str_to_charsxp(x) /lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:142, in _str_to_charsxp(val) 140 s = rlib.R_NaString 141 else: --> 142 cchar = _str_to_cchar(val, encoding='utf-8') 143 s = rlib.Rf_mkCharCE(cchar, openrlib.rlib.CE_UTF8) 144 return s /lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:121, in _str_to_cchar(s, encoding) 119 def _str_to_cchar(s: str, encoding: str = 'utf-8'): 120 # TODO: use isString and installTrChar --> 121 b = s.encode(encoding) 122 return ffi.new('char[]', b) AttributeError: 'numpy.float64' object has no attribute 'encode' ```

Describe the solution you'd like

Support for converting sparse data frames in rpy2. I know there is https://github.com/rpy2/rpy2-Matrix but it does not cover data frames (and is not published).

Describe alternatives you've considered Having another package handle this would not be nice for interactive usage.

Additional context None

lgautier commented 2 years ago

Sparse data array handling is not part of R's standard library. The Matrix package does handle them though.

The most natural way to address this ticket seems to be:

What do you think? If the case I'd transfer this issue to the rpy2-Matrix repository.

krassowski commented 2 years ago

Sounds good.

lgautier commented 2 years ago

I have updated the code to match the current latest release of the R package Matrix. I'll let you try the viability of a converter.

Two notes: