rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.68k stars 327 forks source link

`caught segfault, cause 'memory not mapped'` after calling reticulate functions on NixOS #487

Open aswan89 opened 5 years ago

aswan89 commented 5 years ago

I'm attempting to use Reticulate (via keras/tensorflow) on NixOS and running into a segfault.

Nix has it's own way of doing things involving a centralized store of binaries that are then referenced in user environments via symlinks, meaning a lot of tools used to construct environments like virtualenv fail since the binaries are not explicitly where the tools are expected. In this case I am able to drop into a nix-shell as outlined in heading 9.14.1.1.3.1 that has python, keras, and tensorflow available. I can successfully launch a python REPL from this shell and import/assign keras objects.

I can also launch an R REPL and load reticulate, but attempting to use any reticulate functions causes the following segfault and traceback:

*** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: py_run_string_impl(paste0("import sys; sys.path.append('", system.file("python",     package = "reticulate"), "')"))
 2: initialize_python(required_module, use_environment)
 3: ensure_python_initialized()
 4: import_builtins(convert = FALSE)
 5: repl_python()

R is aware of python's and reticulate's location in the nix-store:

> Sys.which('python')
                                                                    python 
"/nix/store/m2br92794yw09hjwh3vzadbphckagmzv-python3-3.7.3-env/bin/python" 

> paste0("import sys; sys.path.append('", system.file("python",     package = "reticulate"), "')")
[1] "import sys; sys.path.append('/nix/store/v64aywkbbr4qj92xv19fxmhdv66icl3q-r-reticulate-1.10/library/reticulate/python')"

This appears to be some kind of error with how reticulate is calling py_run_string_impl. It may be that I have more work to do with pulling in the correct dependencies in a Nix-friendly way but I don't think a segfault is the appropriate way for this to fail.

deliciouslytyped commented 5 years ago

I was trying to help when this got posted, I got as far as getting gdb to tell me the crash is in 0x00007fffdae5b46b in PyModule_GetDict () from /nix/store/0n8slcq8p5x31kc9hncabsqq9y3fpkzp-python3-3.7.3/lib/libpython3.7m.so, but that's as far as I got. I think that's the following line: https://github.com/rstudio/reticulate/blob/7be2d8462801c271f12577dd65334ed029cf0053/src/python.cpp#L2055

Sidenote: a lot of crashes seem to come from initialization code, it would be nice if it could be made more robust somehow.

ecoughlan commented 5 years ago

The above PR fixes that crash (for me). As a workaround, it's also possible to specify PYTHONPATH in the environment, thus preventing the numpy import error, like so: export PYTHONPATH=${pyEnv}/lib/python3.7:${pyEnv}/lib/python3.7/site-packages

The defaults don't work on NixOS, because the reticulate config.py reads the python sysconfig prefix, which is then passed to libpython as pythonhome. So when you build a python-with-packages environment, the prefix/exec-prefix point to an actual python derivation, instead of the environment with all the packages.

deliciouslytyped commented 5 years ago

Hmm, I don't have mental space for looking into it more right now but that would explain why I got a numpy import error when I compiled python in debug mode. I'm just not sure why compiling in debug mode printed some kind of short import error instead of the same segfault.

deliciouslytyped commented 5 years ago

@ecoughlan I'd like to be able to make sense of stuff like this later, how did you find which part of the code was the culprit? I didn't see anything suggesting to look in places other than https://github.com/rstudio/reticulate/blob/7be2d8462801c271f12577dd65334ed029cf0053/src/python.cpp#L2055

ecoughlan commented 5 years ago

Unfortunately I don't have any good techniques, I just happened to have a working reticulate with python 3.6 in place and this export PYTHONPATH=${$(readlink $(which python))%bin/python}lib/python3.3/site-packages in my .zshrc. That broke on upgrading (Python 3.6 doesn't find the lib either, but it doesn't segfault, so the Problem is more obvious). And my remaining debugging was a lot less sophisticated than your above nix-shell snippet, sorry :/ (basically I patched out swathes of code to make a minimally crashing test case).

aswan89 commented 5 years ago

@ecoughlan Thanks for the pointers on setting PYTHONPATH, that has seemed to help a great deal. The next question is how to set it via nix-shell as it seems like worse practice to copy and paste the python nix-store environment location into a bash export command.

Right now my shell.nix looks like this:

with import <nixpkgs> {};                                                                                                                                                                                                                                 

(python3.withPackages (ps:
  [
    ps.numpy
    ps.Keras
    ps.tensorflow
  ]
 )
).env

What is the best practice for exporting the nix-store location that the results of this derivation will be stored in? I assume it's some sort of call to shellHook or an override to buildEnv but I can't find anyplace that can set an arbitrary environment variable that contains $out.

deliciouslytyped commented 5 years ago

Just as a note, you usually cant self-reference yourself like that (unless you have access to "$out" - which of course works because its at "run time" and not build time). However in this case you don't need to, you just need to refer to the numpy derivation - I think? However, I would have thought PYTHONPATH would be covered by .withPackages, so I'm suspicious that something is off here?

In any case, for the sake of example you'd I'd use stdenv.mkDerivation or mkShell combined with a shellHook containing something like the following snippet, and put your python stuff in propagatedBuildInputs. I didn't test this but it should be roughly correct based on the previous information. Alternatively you could use makeWrapper and put a launcher in your PATH.

{
# ...
  shellHook =  ''
    export PYTHONPATH="${stuffGoesHere}":$PYTHONPATH #TODO find the proper escaping example
    '';
# ...
}

I also recommend that you should get your R environment from the nix shell too, like I did in https://github.com/NixOS/nixpkgs/issues/60941, (minus the debug mess) unless you have some reason you don't want to.

aswan89 commented 5 years ago

I think I've got it. I'll paste my shell.nix for any other poor soul who runs into this issue. I'll leave this open until @ecoughlan 's pull request is merged, but I think there might be an issue in Nix with how PYTHONPATH is or isn't set within python.withPackages.

let pkgs = import <nixpkgs> {}; in
  let
  tarPyPacks = pkgs.python3.withPackages (p: [p.Keras p.numpy p.tensorflow]);
  tarRPacks = pkgs.rWrapper.override { packages = with pkgs.rPackages; [ reticulate keras tensorflow ];};
  in pkgs.mkShell {
    name = "forwardProgress";
    propagatedBuildInputs = [
      tarPyPacks
      tarRPacks
    ];
    shellHook = ''
     export PYTHONPATH="${tarPyPacks}/lib/python3.7:${tarPyPacks}/lib/python3.7/site-packages"
     '';
   }
deliciouslytyped commented 5 years ago

@aswan89 you won't believe it, but I ended up ~needing~ (I haven't tested if it's broken without your snipped) your workaround today. :P So it was helpful.