ninia / jep

Embed Python in Java
Other
1.31k stars 149 forks source link

Freeze when using SubInterpreter's shared modules and doing from scipy import ndimage with scipy 1.9+ #487

Open ndjensen opened 1 year ago

ndjensen commented 1 year ago

Describe the bug We have encountered a freeze of a thread that is running with Jep's SubInterpreter and using shared modules for numpy and scipy. The Python code

import scipy
from scipy import ndimage

freezes on the import of ndimage. It works fine with scipy 1.8, but with scipy 1.9 and 1.10 it freezes. If you import directly with import scipy.ndimage it works fine, only the from import has the issue. Interestingly, in scipy 1.9 they reworked the import of submodules here: https://github.com/scipy/scipy/pull/15230. So that is probably related somehow.

To Reproduce

    public static void main(String[] args) throws JepException {
        JepConfig jepConfig = new JepConfig();
        jepConfig.addSharedModules("numpy", "scipy");
        try (Jep jep = new SubInterpreter(jepConfig)) {
            jep.exec("import numpy");
            jep.exec("import scipy");
            System.out.println("Start ndimage import");
            jep.exec("from scipy import ndimage");
            System.out.println("End ndimage import");
        }
    }

Expected behavior The import does not freeze and scipy works ok.

Environment (please complete the following information):

bsteffensmeier commented 1 year ago

Here is what we want to happen when a submodule like scipy.ndimage is imported in a sub-interpreter:

  1. Python breaks apart scipy.ndimage and imports scipy first.
  2. The import hits the jep shared_modules_hook which sees that scipy is shared
  3. The shared_modules_hook sends the import to the main interpreter thread.
  4. The main interpreter imports scipy and returns
  5. The shared_modules_hook returns the scipy module from the main interpreter.
  6. Python will then attempt to import scipy.ndimage
  7. The import hits the jep shared_modules_hook which sees that scipy is shared
  8. The shared_modules_hook sends the import to the main interpreter thread.
  9. The main interpreter imports scipy.ndimage and returns
  10. The shared_modules_hook returns the scipy.ndimage module from the main interpreter.
  11. Python returns scipy.ndimage from the shared_modules_hook

This process actually works fine if you do import scipy.ndimage on a sub-interpreter. When you do from scipy import ndimage things start to fall apart at step 6. In this case Python checks the scipy module for an existing attribute named ndimage. This ends up in the new scipy getattr function which uses importlib to import scipy.ndimage.

The problem is that importlib was imported when the module is created, so it is the importlib module from the main interpreter since the module was created on the main interpreter. The main interpreter doesn't have the shared_modules_hook so step #7 just doesn't happen. Instead the import proceeds normally on the sub-interpreter, using the importlib from the main interpreter. This doesn't actually run into problems until the numpy.ndimage module tries to import another module, at this point the import of numpy.ndimage._filter goes through the importlib for the sub-interpreter and finds the shared_modules_hook which transfers the import to the main interpreter which freezes. I believe the freeze is because there are locks in importlib that prevent multiple threads from doing the same import. Since the sub-interpreter is using the importlib from the main interpreter it holds the locks so when an import is transferred to the actual main interpreter thread it cannot get the locks and freezes. I am not sure which locks are actually causing the problem.

As far as fixing the problem I am not sure we can actually fix it in Jep. My strongest recommendation is to switch to SharedInterpreter instead of SubInterpreter if you are using python modules which are incompatible with sub-interpreters. Shared Modules can provide a nice workaround in some cases but I do not think we can smooth out all the odd behavior in cases like this. If you control the python code executing in jep you could also just import scipy.ndimage instead of from scipy import ndimage

It might be possible to install an import hook on the main interpreter that would detect if it is in a different interpreter but I am not sure we can do much after detecting a potential problem. If we could do a normal shared import from a hook on the main interpreter it would potentially fix the problem but I suspect trying to do that would lead to freezes, like it does now, because the sub-interpreter would already have the import locks for the main interpreter. It may be worth testing since I don't fully understand the locking mechanisms in importlib. If we could check the main interpreter import lock from the shared interpreter we could definitely prevent freezing but all we could really do is throw an exception. Again more research would be needed into the locking to see if that is even possible and I do not think it will lead to a full solution, just an error instead of a freeze.

bsteffensmeier commented 1 year ago

If you don't mind modifying scipy code another workaround is to move the import of importlib into the __getattr__ function. That way it would use the importlib from the sub-interpreter rather than the importlib from the main interpreter.

ndjensen commented 1 year ago

Thank you for the analysis. On the system in question we could change this particular import to not be a from scipy import submodule import, but ultimately that isn't a safe solution as the system can be extended by others with Python code that we don't have control over, and therefore it could easily be reintroduced.

ndjensen commented 1 year ago

For future reference, my email to the scipy-dev mailing list is here. My post to the CPython discussion forum is here.