rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.65k stars 326 forks source link

ImportError when loading Python package after R package on Windows #1541

Open amoeba opened 4 months ago

amoeba commented 4 months ago

This isn't necessarily a bug report for reticulate and may end up being a change to arrow's build system but I thought I'd write this up in case you had any pointers and to serve as documentation in case anyone else runs into it.

Initially filed as https://github.com/apache/arrow/issues/40073, a user running reticulate on Windows ran into an import error when they loaded the arrow R package before importing the arrow Python package (PyArrow), but not the other way around. I think it's relevant that the R and Python package's both link to a DLL named "arrow.dll".

ImportError when loading R package first:

> library(arrow)
> library(reticulate)
> pa <- import("pyarrow")
Error in py_module_import(module, convert = convert) : 
  ImportError: DLL load failed while importing lib: The specified procedure could not be found.
Run `reticulate::py_last_error()` for details.
> reticulate::py_last_error()

── Python Exception Message ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Traceback (most recent call last):
  File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 119, in _find_and_load_hook
return _run_hook(name, _hook)
^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 93, in _run_hook
module = hook()
^^^^^^
  File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 117, in _hook
return _find_and_load(name, import_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bryce\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\pyarrow\__init__.py", line 65, in <module>
  import pyarrow.lib as _lib
File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 119, in _find_and_load_hook
return _run_hook(name, _hook)
^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 93, in _run_hook
module = hook()
^^^^^^
  File "C:\Users\Bryce\AppData\Local\R\win-library\4.3\reticulate\python\rpytools\loader.py", line 117, in _hook
return _find_and_load(name, import_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ImportError: DLL load failed while importing lib: The specified procedure could not be found.

── R Traceback ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
▆
1. └─reticulate::import("pyarrow")
2.   └─reticulate:::py_module_import(module, convert = convert)

No error when they load the R package second:

> library(reticulate)
> pa <- import("pyarrow")
> library(arrow)

The user that reported the issue found they could fix the issue by patching PyArrow's DLLs with mt.exe and I think I've more or less confirmed this works. From my initial research, it looks like Arrow may be able to incorporate this into its build system but it's not clear if this is the best approach.

So I have a few questions I'd appreciate any help answering:

PS: The nearest reticulate issue I can find is https://github.com/rstudio/reticulate/issues/1357 though I'm not sure how related it is.

amoeba commented 4 months ago

Small update here: I found out we dealt with this in another context (conda) a while back and handled it by renaming R's arrow.dll on build. If that's as complicated as this issue is to fix, would there be a good place in the reticulate docs for a note about this to go?

t-kalinowski commented 4 months ago

Thanks for reporting. This is fortunately the first time I'm seeing an issue like this. Fortunately because I don't think there is much we can do in reticulate to guard users against name clashes in dll's like this. If a particular build of arrow.dll is indeed R or Python specific, then probably the cleanest fix would be to include that in the name like r-arrow.dll and py-arrow.dll.

amoeba commented 4 months ago

I think a simple rename on the R side may be the solution we end up with on our end. I'll share what we end up doing on our side over in this issue so folks can find it.