judithabk6 / med_bench

BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

rpy2 conversion not working #62

Closed judithabk6 closed 9 months ago

judithabk6 commented 9 months ago

I have 2 different issues The first one is a change of behavior of the utils function https://github.com/judithabk6/med_bench/blob/2cd241ccd6880baa4ebd815336d7bb8dc48363df/src/med_bench/utils/utils.py#L52

is different in my new environment (more recent rpy2)

from med_bench.get_simulated_data import simulate_data
from numpy.random import default_rng
import numpy as np
from med_bench.mediation import r_mediation_DML
from med_bench.utils.utils import get_interactions, _convert_array_to_R
import rpy2

SAMPLE_SIZE = 1000
CONFIG = 1

data = simulate_data(
    n=SAMPLE_SIZE,
    rg=default_rng(45),
    mis_spec_m=False,
    mis_spec_y=False,
    dim_x=5,
    dim_m=1,
    seed=9,
    type_m="binary",
    sigma_y=0.5,
    sigma_m=0.5,
    beta_t_factor=0.5,
    beta_m_factor=0.5,
)

x, t, m, y = data[0], data[1], data[2], data[3]

In [7]: _convert_array_to_R(x)
Out[7]: 
R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7ff6232d2820 / R:0x5591098e48b0>
[-0.457865, 0.262484, -0.017788, -1.458903, ..., 1.252166, 0.111703, -0.301062, -0.008739]

In [10]: rpy2.__version__
Out[10]: '2.9.4'

and with the more recent rpy2

In [9]: _convert_array_to_R(x)
Out[9]: 
array([[-0.45786521, -0.62285868,  0.13529003,  0.84390259, -0.14849869],
       [ 0.26248378,  0.53166804,  0.6898073 , -0.89189867,  0.46882042],
       [-0.01778763, -0.6014297 , -0.67763502, -0.22721533, -0.9935554 ],
       ...,
       [ 1.75438122, -0.53414821, -1.40090863, -1.16002101,  0.11170349],
       [-1.52540083, -1.28988308,  0.70203506,  1.70493594, -0.30106179],
       [ 0.12054093,  1.18968031,  1.41277898, -0.59853182, -0.00873877]])

In [11]: rpy2.__version__
Out[11]: '3.5.11'

and this breaks some estimators of course.

And even in the env with the old rpy2, there is a weird behavior.

In [2]: r_mediation_DML(x, t, m, y, trim=0.0, order=1)
/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in glmnet::glmnet(x, y, family = c("binomial"), alpha = 1, lambda = lambda[1],  : 
  x should be a matrix with 2 or more columns

  warnings.warn(x, RRuntimeWarning)
/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In addition: 
  warnings.warn(x, RRuntimeWarning)
/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: There were 25 warnings (use warnings() to see them)
  warnings.warn(x, RRuntimeWarning)
/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 

  warnings.warn(x, RRuntimeWarning)
---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-2-917368025e72> in <module>
----> 1 r_mediation_DML(x, t, m, y, trim=0.0, order=1)

/data/parietal/store/work/jabecass/med_bench/src/med_bench/mediation.py in r_mediation_DML(y, t, m, x, trim, order)
    960     x_r, t_r, m_r, y_r = [base.as_matrix(_convert_array_to_R(uu)) for uu in
    961                           (x, t, m, y)]
--> 962     res = causalweight.medDML(y_r, t_r, m_r, x_r, trim=trim, order=order)
    963     raw_res_R = np.array(res.rx2('results'))
    964     ntrimmed = res.rx2('ntrimmed')[0]

/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

/data/parietal/store/work/jabecass/miniconda3/envs/causal2023/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in glmnet::glmnet(x, y, family = c("binomial"), alpha = 1, lambda = lambda[1],  : 
  x should be a matrix with 2 or more columns

but executing directly what is in the function works

causalweight = rpackages.importr('causalweight')
trim = 0.0

order = 1

x_r, t_r, m_r, y_r = [base.as_matrix(_convert_array_to_R(uu)) for uu in
                      (x, t, m, y)]
res = causalweight.medDML(y_r, t_r, m_r, x_r, trim=trim, order=order)
raw_res_R = np.array(res.rx2('results'))
  ntrimmed = res.rx2('ntrimmed')[0]

I think it's related to the new structure introduced by #58 (import has not the same behavior)

I propose to fix everything for the more recent rpy2 (the old rpy2 also breaks the import of RRuntimeError from rpy2, which is why I changed in the first place)

judithabk6 commented 9 months ago

I think I have fixed the first issue (see branch fix_conversion_r_matrix), but I have the second issue in the new version. There is some problem if you import the function from somewhere else than where rpy2 is imported. Do you have any idea @bthirion? I am really blocked.

here is the exact code

In [1]: from med_bench.get_simulated_data import simulate_data
   ...: from numpy.random import default_rng
   ...: import numpy as np
   ...: from med_bench.mediation import r_mediation_DML

In [2]: SAMPLE_SIZE = 1000
   ...: CONFIG = 1
   ...: 
   ...: data = simulate_data(
   ...:     n=SAMPLE_SIZE,
   ...:     rg=default_rng(45),
   ...:     mis_spec_m=False,
   ...:     mis_spec_y=False,
   ...:     dim_x=5,
   ...:     dim_m=1,
   ...:     seed=9,
   ...:     type_m="binary",
   ...:     sigma_y=0.5,
   ...:     sigma_m=0.5,
   ...:     beta_t_factor=0.5,
   ...:     beta_m_factor=0.5,
   ...: )
   ...: 
   ...: x, t, m, y = data[0], data[1], data[2], data[3]
   ...: 
   ...: r_mediation_DML(x, t, m, y, trim=0.0, order=1)
R[write to console]: Error in glmnet::glmnet(x, y, family = c("binomial"), alpha = 1, lambda = lambda[1],  : 
  x should be a matrix with 2 or more columns

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
Cell In[2], line 21
      4 data = simulate_data(
      5     n=SAMPLE_SIZE,
      6     rg=default_rng(45),
   (...)
     16     beta_m_factor=0.5,
     17 )
     19 x, t, m, y = data[0], data[1], data[2], data[3]
---> 21 r_mediation_DML(x, t, m, y, trim=0.0, order=1)

File /data/parietal/store/work/jabecass/med_bench/src/med_bench/mediation.py:962, in r_mediation_DML(y, t, m, x, trim, order)
    930 """
    931 y       array-like, shape (n_samples)
    932         outcome value for each unit, continuous
   (...)
    958         Powers command of the LARF package.
    959 """
    960 x_r, t_r, m_r, y_r = [_convert_array_to_R(uu) for uu in
    961                       (x, t, m, y)]
--> 962 res = causalweight.medDML(y_r, t_r, m_r, x_r, trim=trim, order=order)
    963 raw_res_R = np.array(res.rx2('results'))
    964 ntrimmed = res.rx2('ntrimmed')[0]

File /data/parietal/store/work/jabecass/miniconda3/envs/causal2024/lib/python3.10/site-packages/rpy2/robjects/functions.py:208, in SignatureTranslatedFunction.__call__(self, *args, **kwargs)
    206         v = kwargs.pop(k)
    207         kwargs[r_k] = v
--> 208 return (super(SignatureTranslatedFunction, self)
    209         .__call__(*args, **kwargs))

File /data/parietal/store/work/jabecass/miniconda3/envs/causal2024/lib/python3.10/site-packages/rpy2/robjects/functions.py:131, in Function.__call__(self, *args, **kwargs)
    129     else:
    130         new_kwargs[k] = cv.py2rpy(v)
--> 131 res = super(Function, self).__call__(*new_args, **new_kwargs)
    132 res = cv.rpy2py(res)
    133 return res

File /data/parietal/store/work/jabecass/miniconda3/envs/causal2024/lib/python3.10/site-packages/rpy2/rinterface_lib/conversion.py:45, in _cdata_res_to_rinterface.<locals>._(*args, **kwargs)
     44 def _(*args, **kwargs):
---> 45     cdata = function(*args, **kwargs)
     46     # TODO: test cdata is of the expected CType
     47     return _cdata_to_rinterface(cdata)

File /data/parietal/store/work/jabecass/miniconda3/envs/causal2024/lib/python3.10/site-packages/rpy2/rinterface.py:817, in SexpClosure.__call__(self, *args, **kwargs)
    810     res = rmemory.protect(
    811         openrlib.rlib.R_tryEval(
    812             call_r,
    813             call_context.__sexp__._cdata,
    814             error_occured)
    815     )
    816     if error_occured[0]:
--> 817         raise embedded.RRuntimeError(_rinterface._geterrmessage())
    818 return res

RRuntimeError: Error in glmnet::glmnet(x, y, family = c("binomial"), alpha = 1, lambda = lambda[1],  : 
  x should be a matrix with 2 or more columns

but then if you add

In [3]: from med_bench.utils.utils import get_interactions, _convert_array_to_R

In [4]: _convert_array_to_R(x)
Out[4]: <rpy2.rinterface.FloatSexpVector object at 0x7f920ce52840> [RTYPES.REALSXP]

In [5]: import rpy2.robjects.packages as rpackages

In [6]: causalweight = rpackages.importr('causalweight')

In [7]: x_r, t_r, m_r, y_r = [_convert_array_to_R(uu) for uu in (x, t, m, y)]

In [8]: res = causalweight.medDML(y_r, t_r, m_r, x_r, trim=0.0, order=1)

In [9]: 

it works

bthirion commented 9 months ago

Yes, the best thing is to make everything work with the latest rpy2. (I won't be able to help on this ; just notice that I also got the glmnet error. It is unclear to me why it should eb related to #58)

judithabk6 commented 9 months ago

I have the feeling it's related to the fact that the function is not in the same module, there are those particular rpy2 things, like ri.initr().... but maybe not.

I will go back to it shortly

Le jeu. 21 déc. 2023 à 22:25, bthirion @.***> a écrit :

Yes, the best thing is to make everything work with the latest rpy2. (I won't be able to help on this ; just notice that I also got the glmnet error. It is unclear to me why it should eb related to #58 https://github.com/judithabk6/med_bench/pull/58)

— Reply to this email directly, view it on GitHub https://github.com/judithabk6/med_bench/issues/62#issuecomment-1866947153, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKUVEJ5WQ7FPJ3BFWVNRVLYKSSMRAVCNFSM6AAAAABA6F2K52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHE2DOMJVGM . You are receiving this because you authored the thread.Message ID: @.***>

judithabk6 commented 9 months ago

ok. So my fix to _convert_array_to_R works, and my issue was related to #63.... so everything is ok.

judithabk6 commented 9 months ago

actually the fix was not needed... it was working before, I was just calling r_mediation_DML with a bad order of arguments