ninia / jep

Embed Python in Java
Other
1.3k stars 147 forks source link

SubInterpreter: importing numpy without necessity? #418

Open Daniel-Alievsky opened 2 years ago

Daniel-Alievsky commented 2 years ago

Please look at the following simple test:

public class SimpleJepForSubInterpreterAndNumpy {
    public static void main(String[] args) throws InterruptedException {
        Thread t = new Thread(() -> {
            try (Interpreter interp = new SubInterpreter()) {
                Object result = null;
                System.out.printf("%nInterpreter: %s%n", interp);
// Numpy 1
//                interp.exec("import numpy\n");
                interp.exec("class myClass():\n    pass\n");
                interp.exec("def createMyClass():\n    return myClass()\n");
                interp.exec("def myTestString():\n    return '123'\n");
                interp.exec("def myTestArray():\n    return [1,2,3]\n");
                interp.exec("def myTestNumber():\n    return 123\n");
                interp.exec("print(myTestNumber())");

// Numpy 2
//                System.out.println("Getting PyCallable");
//                final PyCallable callable = interp.getValue("myClass", PyCallable.class);
//                System.out.println("Calling PyCallable");
//                result = callable.call();
//                System.out.printf("call result: %s%n", result);

// Numpy 3
//                System.out.println("Calling constructor");
//                final PyObject myClass = (PyObject) interp.invoke("myClass");
//                System.out.printf("invoke result: %s%n", myClass);

                System.out.println("Calling function");
                result = interp.invoke("myTestString");
                System.out.printf("invoke result: %s%n", result);
            }
        });
        t.start();
        t.join();

        t = new Thread(() -> {
            try (Interpreter interp = new SharedInterpreter()) {
                System.out.printf("%nInterpreter: %s%n", interp);
                interp.exec("import numpy\n");
                interp.exec("def myTest():\n    return ['1','2']\n");
                interp.exec("print(myTest())");
                Object result = interp.invoke("myTest");
                System.out.printf("invoke result: %s%n", result);
            }
        });
        t.start();
        t.join();
    }
}

(It is published also at https://bitbucket.org/DanielAlievsky/stare-python-experiments/src/master/jep-java-tests/src/test/java/com/siams/stare/extensions/python/tests/ )

I try to use SubInterpreter in a very simple manner, in a separate unique thread. Then, in another thread, I use SharedInterpreter. This test works well, O'k.

PROBLEM 1) Let's uncomment "Numpy 1": interp.exec("import numpy\n");

We see warning:

:1: UserWarning: NumPy was imported from a Python sub-interpreter but NumPy does not properly support sub-interpreters. This will likely work for most users but might cause hard to track down issues or subtle bugs. A common user of the rare sub-interpreter feature is wsgi which also allows single-interpreter mode. Improvements in the case of bugs are welcome, but is not on the NumPy roadmap, and full support may require significant effort to achieve. O'k. But after this, the second block of processing SharedInterpreter leads to system crash on my computer! A fatal error has been detected by the Java Runtime Environment: EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ff8eab877f9, pid=17084, tid=14452 .... Is it a bug, really? In another configuration, with my single-thread execution system, I see another message instead of crash: : Interpreter change detected - this module can only be loaded into one interpreter per process. C:\Users\Daniel\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\random\_pickle.(_pickle.py:1) C:\Users\Daniel\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\random\__init__.(__init__.py:180) C:\Users\Daniel\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\__init__.(__init__.py:155) .(:2) jep.Jep.exec(Native Method) jep.Jep.exec(Jep.java:339) Also not too good: since this moment I cannot use Jep with numpy at all, until full restart JVM. I know that SubInterpreter is incompatible with Numpy, but is it possible to fix this behavior? It is much better to throw an exception instead of printing "UserWarning", as well as AFTER this warning there is a risk of system crash or stopping normal work of all ShareInterpreter instances. PROBLEM 2) O'k, I will not try to import numpy in SubInterpreters. Please uncomment, instead of the previous one, the section "Numpy 2": an attempt to create an instance of myClass. The results will be absolutely same! Moreover, the results will be the same if I will not create an object, but will call interp.getValue("myTestString", PyCallable.class); It seems that any access to PyCallable leads to invisible importing numpy. In other words, PyCallable cannot be used with SubInterpreter at all. Why? PROBLEM 3) O'k, maybe we will be able to avoid PyCallable. Please uncomment the section "Numpy 3". I'll try to instantiate the class "myClass" by a simple call interp.invoke("myClass"); Results the same: UserWarning + system crash later. Interesting that interp.invoke("myTestString") later works fine, but I sometimes need to create classes, not just to call functions... An even function "createMyClass" does not help! PROBLEM 4) O'k: no classes, no instances, no any using numpy. We don't uncomment anything and work with very pure Python: functions and arguments. My test works normally while we call "myTestString". But try to replace calling myTestString with calling myTestArray. It is not numpy, it is an ordinary Python array. But results are the same! Moreover, myTestNumber also leads to the same problem: UserWarning + system crash. Though it is just a _number_, no array. I cannot write anything useful, if usual numbers as inputs or outputs will be disabled. It seems your system tries to import numpy in a lot of situations, when it is not necessary actually. As a result, SubInterpterer is practically non-functional, it can not be used: even simplest tests can lead to fatal problems. My offer is to try avoiding numpy (for SubInterpreter) besides real necessity. And even if the user tries to use it, it seems to be better to strongly prohibit this and to throw an exception instead of just "warning". At least it should be controlled via JepConfig: if the programmer wants to use numpy very much inside SubInterpreter (why?), he must request it directly via special JepConfig key. Or, as a variant, just exclude SubInterpreter from your library or mark it as very deprecated. What do you think?
ndjensen commented 2 years ago

The warning you saw comes from numpy, not Jep.

Please read the wiki before you open tickets about problems like this. Specifically, read the page Workarounds for CPython Extensions, but given your level of interest in Jep it would be good to read the entire wiki. You can add numpy to the JepConfig's sharedModules to achieve better stability with SubInterpreters and numpy.

Daniel-Alievsky commented 2 years ago

It seems you didn't understand me well. I've already performed almost all that you recommended and even more: please read my JepSingleThreadInterpreter and other classes. Problem is that I cannot use SubInterpreter at all, even when I don't need numpy

O'k, please just uninstall numpy on your computer and try uncomment section 2 or section 3 in my test. Remove the last code with StareInterpreter at all. No there are no any attempts to use numpy:

    public static void main(String[] args) throws InterruptedException {
        Thread t = new Thread(() -> {
            try (Interpreter interp = new SubInterpreter()) {
                Object result = null;
                System.out.printf("%nInterpreter: %s%n", interp);
                interp.exec("class myClass():\n    pass\n");
                interp.exec("def myTestString():\n    return '123'\n");
                interp.exec("def myTestArray():\n    return [1,2,3]\n");
                interp.exec("def myTestNumber():\n    return 123\n");

                System.out.println("Calling constructor");
                final PyObject myClass = (PyObject) interp.invoke("myClass");
                System.out.printf("invoke result: %s%n", myClass);

                System.out.println("Calling function");
                result = interp.invoke("myTestString");
                System.out.printf("invoke result: %s%n", result);
            }
        });
        t.start();
        t.join();
    }

But JEP prints "ModuleNotFoundError: No module named 'numpy'" two times. And the same occurs when I try to call myTestNumber instead of myTestString.

It is obvious that it tries to import numpy without any necessity. If numpy is really installed, it leads to catastrophic results for any other clients of JEP library inside the same JVM. Conclusion: SubInterpreter in current implementation cannot be used in professional projects practically at all, besides very simple one-page demo tests. If so, why do you offer it?

Of course, in Python outside JEP there are no any problems with such code. For example, the following test works well regardless, whether numpy installed or no:

def myTestString():
    return "123"

def myTestArray():
    return [1,2,3]

def myTestNumber():
    return 123

if __name__ == '__main__':
    print(myTestArray())

This is problem of JEP, not CPython.

Daniel-Alievsky commented 2 years ago

It seems that numpy is activated every time when anyone tries to create an instance of any class, not important, by PyCallable or by Python code. Example:

                interp.exec("class myClass():\n    pass\n");
...
                interp.exec("_myClass = myClass()");

The last call leads to the "sys:1: UserWarning..." and further system crash (while using ShareInterpretere) if numpy is installed, or to warning ModuleNotFoundError: No module named 'numpy' when it is not installed.

Daniel-Alievsky commented 2 years ago

I didn't try to debug your C code yet, but it seems that one of the problems is the following:

jobject PyObject_As_jobject(JNIEnv *env, PyObject *pyobject,
                            jclass expectedType)

#if JEP_NUMPY_ENABLED
    } else if (npy_scalar_check(pyobject)) {
        jobject result = convert_npy_scalar_jobject(env, pyobject, expectedType);
        if (result != NULL || PyErr_Occurred()) {
            return result;
        } else if ((*env)->IsAssignableFrom(env, JPYOBJECT_TYPE, expectedType)) {
            return PyObject_As_JPyObject(env, pyobject);
        }
#endif

Here you calls npy_scalar_check, which starts from "if (!init_numpy())". And this function, I think, imports numpy module, right? With all dramatic results, as I wrote. In a case of SubInterpreter (unlike ShareInterpreter), all numpy operation should be disabled, unless the user explicitly enabled them via JepConfig or via some other way (system property?) In other case, this "useful" code makes impossible usage of SubInterpreter at all, as I explained. It seems that creating objects uses something like this, with the same results.

ndjensen commented 2 years ago

Please check the top of the output of python setup.py build or python setup.py install. It should say either, "numpy include found at __" or "numpy not found, running without numpy support". When setup.py is run, it attempts to detect numpy and if found that determines whether jep is built with JEP_NUMPY_SUPPORT turned on or off.

You are correct that if built with numpy support, Jep may auto-initialize numpy in some code blocks to make use of the numpy API. However, that code is not built when JEP_NUMPY_SUPPORT is set to 0. You appear to somehow be building with numpy support when you were not expecting it to. When it's building with numpy support, if the arrayobject.h file wasn't there it wouldn't even be able to compile. Check where setup.py says the numpy include file is found. Perhaps it's finding the file somewhere where you are not expecting. Are you using virtualenvs?

Daniel-Alievsky commented 2 years ago

I don't build JEP at all, I'm using it via Maven, standard way of adding external libraries in Java:

        <dependency>
            <groupId>black.ninia</groupId>
            <artifactId>jep</artifactId>
            <version>4.0.3</version>
        </dependency>

And, of course, I need numpy, as well as all other native libraries. My solutions are based on SharerInterpreter and work well.

But I tried to offer my users, as a variant, to call their code via SubInterpreter - it is just a flag in my user interface ("isolated" mode). Why not? If you offer other way of usage, why not to offer this to users? If can be suitable for very simple usage, for example, while learning Python: a user can be sure that his experiments with not affect to other users. However results were terrible: any attempt to do this lead to impossibility of work of all other Python-solutions, executed under the same JVM with help of SharedInterpreter, or even to system crash (in Java world it is a catastrophe, especially for application servers).

It seems that I must remove this flag and not to use SubInterpreter in any cases. But if so, why do you offer it? I still hope that you will change behaviour.

In other words, if using numpy from SubInterpreter is so dangerous, why do you still allow this and even do not offer an ability to disable numpy at all via JepConfig? Let it will be not "warning", let it be exception! Of course, for SubInterpreter only - let SharedInterpreter use numpy. But SubInterpreter should not try to import numpy. What is the sense to enable it, if you know that it will not work fine and can lead to catastrophe? It seems to be simple correction: you just need to replace checking in compilation time with run-time check.

ndjensen commented 2 years ago

You did build the jep native library even if you didn't realize it, otherwise it wouldn't work. Perhaps you did it with pip, which also builds it from source. The jar from maven does not include the native library or the jep python modules. It looks like you built the Jep native library when numpy was installed, then uninstalled numpy but did not rebuild Jep, which you should do to avoid the attempt of the automatic initialization of numpy.

I don't want to replace the compile time check for numpy. If we did that, we'd have to compile the numpy code every time and that would require numpy installed when building/installing Jep, and I'm willing to bet there are plenty of Jep users who don't need numpy and shouldn't be required to install it just to install Jep.

SubInterpreter is stable when used without CPython extensions. Furthermore, SubInterpreters introduced shared modules to work around CPython extension issues, and so if you declare the CPython extension modules you want to use as shared modules, then that has also proven stable. We keep them around because that is all Jep originally had, SharedInterpreter was only introduced in Jep 3.8 and before that we only supported SubInterpreters. Applications that were developed with earlier versions of Jep may require SubInterpreters to maintain their existing behavior. If you don't like SubInterpreters, don't use them.

Daniel-Alievsky commented 2 years ago

You did build the jep native library even if you didn't realize it, otherwise it wouldn't work. Perhaps you did it with pip, which also builds it from source. The jar from maven does not include the native library or the jep python modules. It looks like you built the Jep native library when numpy was installed, then uninstalled numpy but did not rebuild Jep, which you should do to avoid the attempt of the automatic initialization of numpy.

You are absolutely right. I used pip to install Jep, and did this after installing numpy. Yes, if I uninstall numpy, uninstall Jep, then install Jep without using cache, the problem is not appear, even if I install numpy again: ShareInterpreter can use it, SubInterpreter cannot.

But... it is even worst behavior. It means that all our code, which is oriented to normal usage numpy+JEP under SharedInterpreter, will stop work, if the administrator of user's computer installed JEP BEFORE numpy. The reason is a simple: yes, professional Python code works, successfully imports numpy and returns np.array or OpenCV Mat from a function, no any errors (numpy installed!). But Java side expect to receive NDArray. Instead, it receives Java int[] or byte[] array. All this only if JEP was installed before numpy.

Thank you for this information, it is important: I've added corresponding warning inside the text of exception for this case.

It seems you've chosen not the best solution. End user, as well as a technichian which install all purchased software to user's server, should not think about a sequence of installing Python packages. It means that we must such distribute software only with pre-installed virtualenv. Yes, it is possible, but it restricts possibilities of distribution.

Are you sure that it is impossible to make JEP behavior not depending on sequence of installation? What is so bad if you will require numpy before installing JEP? numpy is relatively lightweight and installed very quickly. Of course, it is not necessary for users of "pure Python" in SubInterpreter, but it will also not create problems, if you will add dynamic check as I offer below.

SubInterpreter is stable when used without CPython extensions. Furthermore, SubInterpreters introduced shared modules to work around CPython extension issues, and so if you declare the CPython extension modules you want to use as shared modules, then that has also proven stable. We keep them around because that is all Jep originally had, SharedInterpreter was only introduced in Jep 3.8 and before that we only supported SubInterpreters. Applications that were developed with earlier versions of Jep may require SubInterpreters to maintain their existing behavior. If you don't like SubInterpreters, don't use them.

As you see, SubInterpreter is stable ONLY if the user installed JEP without numpy. In all other cases, SubInterpreter becomes to a "virus", which destroys JVM and blocks any possibility to use SharedInterpreter. Like virus, it destroys all not always, but only in some "bad" situation, like an attempt to return numeric array from a function, and even this will lead to a problem not immediately, but while an first attempt to use SharedInterpreter after this. It is very good that I've tested our system thoroughly enough, in other case a catastrophe (server crash) could occur while commercial usage, when the 1st user would try to choose SubInterpreter for his very simple test, returning number instead of text string...

I offer you to make, at least, very simple change. Inside C++ code, you know, which interpreter is used now - SubInterpreter or SharedInterpreter, right? Please, don't try import and use numpy in a case of SubInterpreter, regardless on JEP_NUMPY_ENABLED setting. It is not only a bad idea (numpy warning is absolutely right, but very dangerous, if someone will try to use SharedInterpreter in the same JVM. Moreover, if there is a possibility to detect an attempt to import numpy from the user's code under SubInterpreter, please block it with corresponding exception, instead of built-in numpy warning.

I do like SubInterpreter as a possible option, but I cannot use it at all without such corrections: I cannot disable all other users to work with numpy, OpenCV and other important libraries. And nobody can use it without a risk of catastrophe.