rpy2 / rpy2

Interface to use R from Python
https://rpy2.github.io
GNU General Public License v2.0
543 stars 72 forks source link

Errors and inconsistencies handling `POSIXct` from pandas to R #712

Open spluque opened 4 years ago

spluque commented 4 years ago

With rpy2 version 3.3.4 and pandas 1.0.5, the following:

import pandas as pd
import rpy2.robjects as robjs
from rpy2.robjects import pandas2ri

pandas2ri.activate()

dti = pd.date_range('2018-01-01', periods=3, freq='H')
robjs.r.summary(dti)

fails with ValueError: ValueError: Unknown numpy array type "datetime64[ns]". If one encloses the pandas.Series in pandas.DataFrame, it works:

dti_df = pd.DataFrame({"DateTimeIndex": dti})
robjs.r.summary(dti_df)
lgautier commented 4 years ago

Does pd.date_range() return a DataTimeRange instead of a Series ?

isinstance(dti, pd.Series)
lgautier commented 4 years ago

The converter for Series is looking for dtype datetime64 (https://github.com/rpy2/rpy2/blob/master/rpy2/robjects/pandas2ri.py#L149), but this does not seem to work as expected here:

>>> robjs.r.summary(pd.Series(dti))
array([1.5147828e+09, 1.5147846e+09, 1.5147864e+09, 1.5147864e+09,
       1.5147882e+09, 1.5147900e+09])
lgautier commented 4 years ago

This is working though:

>>> import rpy2.robjects as ro
>>> s = pd.Series(dti)
>>> ro.conversion.py2rpy(s)
R object with classes: ('POSIXct', 'POSIXt') mapped to:
[2018-01-0..., 2018-01-0..., 2018-01-0...]
lgautier commented 4 years ago

This is like one of the buglets with pandas.activate. Its use is discouraged.

Use a local converter instead (as pointed out in the doc: https://rpy2.github.io/doc/v3.3.x/html/robjects_convert.html#local-conversion-rules):

import rpy2.robjects.conversion as cv
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri

import pandas as pd 
import rpy2.robjects as robjs 

with cv.localconverter(ro.default_converter + pandas2ri.converter): 
    res = robjs.r.summary(pd.Series(dti))
spluque commented 4 years ago

Thanks! I'll avoid using pandas2ri.activate() and use the local converter in context manager, but how come dti needs to be enclosed in pd.Series when it's already a Series?

lgautier commented 4 years ago

dti doesn't seem to be a Series object:

>>> isinstance(dti, pd.Series)
False

However, the pandas converter in rpy2 is currently mostly only able to convert pandas's DataFrame and Series objects.