Open Garbaz opened 1 month ago
To be clear, sudo apt remove libnvjitlink12:amd64
is not really a solution to this problem.
Thanks for reporting!
Are you using the RStudio IDE? Does this happen only in the RStudio IDE, or outside the IDE too?
Ah, I should have added that I'm using R Studio Server. And I should have tested running the repro code directly in R.
I don't have access to a machine at the moment where I can test running the code in normal R Studio Desktop, so I can't check whether it's a R Studio Server specific issue. But running source("repro.R")
, where repro.R
contains the repro code:
library(reticulate)
venv_name <- "deleteme_5267"
virtualenv_create(venv_name)
use_virtualenv(venv_name)
py_install("torch", pip = true)
pytorch <- import("torch")
I do not get the error. And running e.g. pytorch$cuda$is_available()
works as expected.
So it appears to be an interactive between R Studio (Server) and Reticulate that is the issue.
Wait, scratch that, I forgot I uninstalled libnvjitlink12
to temporarily fix the issue. Reinstalling it, I get the same error in plain R!
So it has nothing to do with R Studio (Server) in particular.
I don't think reticulate is modifying the order of loaded libs.
If this occurs with reticulate::import("torch")
in R, but not in a terminal with ~/.virtualenvs/r-torch/bin/python -c 'import torch'
, then it's likely that something in the R session is either
libnvjitlink12
for some reason. Can you please double-check the value of Sys.getenv("LD_LIBRARY_PATH")
in R, and also, inspect other R startup files for code that might be causing this (.Rprofile
, .Renviron
, etc.)?
Both Sys.getenv("LD_LIBRARY_PATH")
and os <- import("os"); os$environ["LD_LIBRARY_PATH"]
give:
"/usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server"
What I do find weird is that there is no mention of the virtualenv, even in os$environ["LD_LIBRARY_PATH"]
, even though, evidently, the libraries from the virtualenv are found.
Okay, it appears Python does not simply use the LD_LIBRARY_PATH
environment variable. At least when I run os.environ["LD_LIBRARY_PATH"]
in the normal python REPL (from the virtualenv), I get a key error.
However, Python does use an environment variable PYTHONPATH
. Running os <- import("os"); os$environ["PYTHONPATH"]
in R I get:
"/usr/local/lib/R/site-library/reticulate/config:/usr/lib/python312.zip:/usr/lib/python3.12:/usr/lib/python3.12/lib-dynload:/home/tobi/.virtualenvs/deleteme_5267/lib/python3.12/site-packages:/usr/local/lib/R/site-library/reticulate/python"
I will investigate whether I can fix the issue by messing with PYTHONPATH
.
Update: I have experimented with both LD_LIBRARY_PATH
and PYTHONPATH
and could not get the issue to go away. I will continue trying to figure this out later this week.
By the way, py_last_error()
gives:
--- Python Exception Message
Traceback (most recent call last):
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 122, in _find_and_load_hook
return _run_hook(name, _hook)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 96, in _run_hook
module = hook()
^^^^^^
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 120, in _hook
return _find_and_load(name, import_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tobi/.virtualenvs/deleteme_5267/lib/python3.12/site-packages/torch/__init__.py", line 290, in <module>
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 122, in _find_and_load_hook
return _run_hook(name, _hook)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 96, in _run_hook
module = hook()
^^^^^^
File "/usr/local/lib/R/site-library/reticulate/python/rpytools/loader.py", line 120, in _hook
return _find_and_load(name, import_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/tobi/.virtualenvs/deleteme_5267/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12
--- R Traceback
▆
1. └─reticulate::import("torch")
2. └─reticulate:::py_module_import(module, convert = convert)
See `reticulate::py_last_error()$r_trace$full_call` for more details.
In case that's of any help.
I am unable to reproduce locally.
Note that PyTorch can be installed a few different ways, depending on your environment. You may want to consult https://pytorch.org/get-started/locally/ and see if there is something that will work better for you than a bare pip install torch
(e.g., pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
I do not think this has anything to do with torch in particular. The reason torch recommends using their bespoke pypi repo has to do with driver/CUDA version incompatibilities and shouldn't change anything about the issue here. I will however try this for completeness.
To reproduce (assuming you have
libnvjitlink12
installed system-wide, and in a different version):The final line gives me this error:
Checking
nm -gDC ~/.virtualenvs/deleteme_5267/lib/python3.12/site-packages/nvidia/nvjitlink/lib/libnvJitLink.so.12 | grep nvJitLinkAddData
is get:So the version of
libnvJitLink.so.12
in the virtualenv has the symbol. And if I activate the virtualenv normally in a shell and importtorch
from inside a normal Python REPL I don't get any errors. So it's not the fault oflibcusparse.so.12
.The thing is though, the library
libnvJitLink.so.12
is also installed system-wide, but in a different version. Checking there withnm -gDC /usr/lib/x86_64-linux-gnu/libnvJitLink.so.12 | grep nvJitLinkAddData
, I get only:And when I remove the system-wide version of the library with
the error no longer occurs.
It appears to be that if there is a system-wide version of a shared library, it is preferred over the local version in the virtualenv. This is not how it things should be!
R version is
4.4.1 (2024-06-14)
and reticulate version isreticulate_1.38.0
.