rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.67k stars 327 forks source link

py_to_r() fails when when importing a python module with an alias #319

Closed hitfuture closed 6 years ago

hitfuture commented 6 years ago

I am using the reticulate package to integrate Python into an R package I'm building (hitfuture/Neo4jDriveR). One of the capabilities I need is to return R data.frames from a method in the R6 based object model. I utilize Python Pandas package to create a DataFrame in the reticulate Python environment. My objective is to return an R data.frame. The issue I'm seeing is that when I used reticulate::py_to_r(df) it does not convert to R and instead it returns a python DataFrame object. The problem is based on the way reticulate::import() is utilized.

I had forked reticulate into my github repository so I am using the latest version (1.9.0.9001). I see that there are well defined S3 methods to handle pandas DataFrame conversion in the reticulate py_to_r() S3 class (e.g. py_to_r.pandas.core.frame.DataFrame). The following test executes correctly in a new R session.

library(reticulate)
library(testthat)

pd <- import("pandas")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris
expect_is(before,class = "data.frame")
convert <- r_to_py(before)
expect_is(convert,class = "pandas.core.frame.DataFrame")
print(class(convert))
## [1] "pandas.core.frame.DataFrame"       
## [2] "pandas.core.generic.NDFrame"       
## [3] "pandas.core.base.PandasObject"     
## [4] "pandas.core.base.StringMixin"      
## [5] "pandas.core.accessor.DirNamesMixin"
## [6] "pandas.core.base.SelectionMixin"   
## [7] "python.builtin.object"
after  <- py_to_r(convert)

expect_is(after,class = "data.frame")

The failure occurs when I utilize the function 'reticulate::import("pandas", as="pd")' with the as parameter. You can see below that the pandas.DataFrame is not converted into an R data.frame. The problem is due to the S3 method for the pandas DataFrame not matching based on the name of the python module.

pd <- import("pandas",as = "pd")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris 
expect_is(before,class = "data.frame")

convert <- r_to_py(before)
expect_is(convert,class = "pd.core.frame.DataFrame")

after  <- py_to_r(convert)
print(class(after))
## [1] "pd.core.frame.DataFrame"        "pd.core.generic.NDFrame"       
## [3] "pd.core.base.PandasObject"      "pd.core.base.StringMixin"      
## [5] "pd.core.accessor.DirNamesMixin" "pd.core.base.SelectionMixin"   
## [7] "python.builtin.object"
#expect_is(after,class = "data.frame")

I have tested this on two different Docker containers, and on my MacBook Pro where the same error occurs. The problem is based on the fact that if you assign an Alias in the import() function, the S3 methods will not be correctly selected. This will apply to all py_to_r conversions if the alias is not equivalent to the default name. There is not an easy solution to solve this with the S3 methodology. It would be good if the S3 methods are exported in the reticulate package so this can be handled manually. This should also be documented.

Thanks, Brett

jjallaire commented 6 years ago

The as parameter is there for normalizing Python module names if they could be implemented by multiple sources (specifically, we use this to have a shared set of R S3 methods for the base keras implementation and the implementation built into tensorflow). I would say don't use as at all if it's causing you problems, and if you do need to use it and there are further issues, just define multiple S3 methods that handle all the cases you need.

If I'm misunderstanding the issue and you can see your way to another resolution do let me know as I'm not sure I follow the issue you are seeing fully!

hitfuture commented 6 years ago

I fixed the problem by removing the as parameter within the package I'm building once I discovered this issue. The main problem I see is that most Python users import these libraries with an alias and they need to understand what you just stated. The py_to_r function will not apply if you use an alias. This applies to any S3 based methods in your reticulate. I think adding this information into your vignettes or other documentation would be useful. This package is incredibly useful and I am very happy with it overall. You can see my opinion here Python in R Thanks for your quick response.

jjallaire commented 6 years ago

The docs on the as parameter is as follows: "Alias for module name (affects names of R classes)"

This certainly does allude to the fact (albeit somewhat obliquely) that S3 method dispatch will be affected.

Nevertheless, I added some additional docs cautioning users against using this parameter outside of package development.

kevinushey commented 6 years ago

I suspect naming the parameter as might give Python users the impression that it operates the same as one might write in Python with e.g.

import pandas as pd

whereas that construct is more accurately reflected on the R side with a plain old

pd <- import("panadas")

It might be more appropriate to call that parameter something like alias just to make it clear it's not intended to be some R analogue to existing Python behavior (of course it might be too late for that now)

jjallaire commented 6 years ago

Yeah, if I rename it alias then it will break existing clients :-(

hitfuture commented 6 years ago

FYI - made a change to reticulate::import() function which renames the S3 methods associated with the module that is being renamed to the alias. This allowed all related methods to function. There is much more testing and I'm sure a better way of doing this that would have to occur yet I think the idea is correct. Take a look if you want at the reticulate::import method and the additional function add_functions_by_alias() I added in.

https://raw.githubusercontent.com/hitfuture/reticulate/fix/alias_functions/R/python.R

jjallaire commented 6 years ago

I don't think I want to be dynamically renaming methods though.

The real question is what does the R user that specifies as hope to accomplish by doing so? It doesn't have the equivalent semantics as Python does (as that is handled by the assignment as @kevinushey points out). We explicitly say in the docs that you are changing S3 dispatch when you use the parameter and that it's an advanced parameter that should only be used in packages. I think this should suffice!

hitfuture commented 6 years ago

Your answer makes sense. I'm not having any problems with reticulate in the package I'm building since I realized what was happening. It is interesting having been trained in Python over the last 3 months where I was required to use pd for pandas and np for numpy which I also used in reticulate because of all of the online education from DataCamp. Thanks again for the feedback and documentation is all that matters here. Reticulate is a huge win for the language R!

jjallaire commented 6 years ago

Okay, great. Glad it's working so well for you!

alyamahmoud commented 4 years ago

I run into the same error even though I import the module without the use of 'as' or 'alias'

reticulate::import("pandas") dat <- py_load_object('dat.p') # pickle object dat2List <- reshape2::melt(dat)

Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ‘c("python.builtin.set", "python.builtin.object")’ to a data.frame

any suggestions ? reticulate is absolutely awesome !

kevinushey commented 4 years ago

What kind of Python object is returned by:

dat <- py_load_object('dat.p') # pickle object

? I suspect you'll need to use py_to_r() and then possibly transform it to ensure it is indeed an R data.frame.

skeydan commented 4 years ago

(((Was just about to answer as well :-)))

I recently used py_load_object "in the wild" and the returned object was a list (it couldn't even have been a data.frame because the object contained columns of different length...