ninia / jep

Embed Python in Java
Other
1.32k stars 150 forks source link

Cannot run tensorflow model training from JEP when using tensorflow-java already in Java Application #378

Open priyadarshi9 opened 2 years ago

priyadarshi9 commented 2 years ago

Describe the problem I am trying to run a simple python script which uses tensorflow to train a model and save it. I am calling this python script from a Java Spring Boot application. The Java project also has a dependency on the Tensorflow-Java module from maven. (https://mvnrepository.com/artifact/org.tensorflow/tensorflow/1.15.0 )

The Java application crashes with this error when I am trying to execute the python script.

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x0000000145e640f5, pid=22230, tid=5891 JRE version: OpenJDK Runtime Environment Corretto-11.0.10.9.1 (11.0.10+9) (build 11.0.10+9-LTS) Java VM: OpenJDK 64-Bit Server VM Corretto-11.0.10.9.1 (11.0.10+9-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64) Problematic frame: C [libtensorflow_framework.1.dylib+0x5c10f5] _ZNK10tensorflow25FunctionLibraryDefinition11GetAttrImplERKNS_7NodeDefE+0x15

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as: /Users/kpriyadarsh/Desktop/misc/Maven Albert Codes/roberta-inference/hs_err_pid22230.log

If you would like to submit a bug report, please visit: https://github.com/corretto/corretto-11/issues/ The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug.

Also, when this maven dependency is removed from my Java application, the script runs fine without any crashes. Could you please help me with resolving this issue? I need to use the Java dependency inside my application.

Additional Information from the error log:

* As can be seen from the call stack dumped below, the function calls somehow tries to access the libtensorflow_framework.1.dylib library, which should not be the case because this file is a part of the Java dependency and not python's. The calls should never reach this file.

Stack: [0x000070000b0f4000,0x000070000b1f4000], sp=0x000070000b1ed700, free space=997k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libtensorflow_framework.1.dylib+0x5c10f5] _ZNK10tensorflow25FunctionLibraryDefinition11GetAttrImplERKNS_7NodeDefE+0x15 C [libtensorflow_framework.1.dylib+0x5c1355] _ZNK10tensorflow25FunctionLibraryDefinition7GetAttrIbEENS_6StatusERKNS_7NodeDefERKNSt3112basic_stringIcNS6_11char_traitsIcEENS69allocatorIcEEEEPT+0x25 C [libtensorflow_framework.1.dylib+0x5c1323] _ZNK10tensorflow25FunctionLibraryDefinition7GetAttrIbEENS_6StatusERKNS_4NodeERKNSt3112basic_stringIcNS6_11char_traitsIcEENS69allocatorIcEEEEPT+0x33 C [_pywrap_tensorflow_internal.so+0x56aa611] _ZN10tensorflow12_GLOBALN_126MarkForCompilationPassImpl25FindCompilationCandidatesEv+0x1171 C [_pywrap_tensorflow_internal.so+0x56a39e0] _ZN10tensorflow12_GLOBALN_126MarkForCompilationPassImpl3RunEv+0xd0 C [_pywrap_tensorflow_internal.so+0x569f325] _ZN10tensorflow12_GLOBALN_118MarkForCompilationERKNS_28GraphOptimizationPassOptionsERKNS0_26MarkForCompilationPassImpl12DebugOptionsE+0x2f5 C [_pywrap_tensorflow_internal.so+0x569efc7] _ZN10tensorflow22MarkForCompilationPass3RunERKNS_28GraphOptimizationPassOptionsE+0x67 C [libtensorflow_framework.2.dylib+0x30a427] _ZN10tensorflow24OptimizationPassRegistry11RunGroupingENS0_8GroupingERKNS_28GraphOptimizationPassOptionsE+0x697 C [libtensorflow_framework.2.dylib+0x2d84b9] _ZN10tensorflow29ProcessFunctionLibraryRuntime22InstantiateMultiDeviceERKNSt3112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEENS_9AttrSliceERKNS_22FunctionLibraryRuntime18InstantiateOptionsEPy+0x18c9 C [libtensorflow_framework.2.dylib+0x2de2a1] _ZN10tensorflow29ProcessFunctionLibraryRuntime11InstantiateERKNSt3112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEENS_9AttrSliceERKNS_22FunctionLibraryRuntime18InstantiateOptionsEPy+0x51 C [_pywrap_tensorflow_internal.so+0x5ae72bb] _ZN10tensorflow19KernelAndDeviceFunc15InstantiateFuncEbRKNS_7NodeDefEPNS_14GraphCollectorE+0x74b C [_pywrap_tensorflow_internal.so+0x5ae7392] _ZN10tensorflow19KernelAndDeviceFunc4InitEbRKNS_7NodeDefEPNS_14GraphCollectorE+0x12 C [_pywrap_tensorflow_internal.so+0x5aad189] _ZN10tensorflow12_GLOBALN_126GetOrCreateKernelAndDeviceEPNS_14EagerOperationEPPNS_12TensorHandleEPiPNSt3110unique_ptrINS_15KernelAndDeviceENS_4core15RefCountDeleterEEE+0x1d39 C [_pywrap_tensorflow_internal.so+0x5aa8271] _ZN10tensorflow12_GLOBALN_117EagerLocalExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi+0x101 C [_pywrap_tensorflow_internal.so+0x5aa6244] _ZN10tensorflow12EagerExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi+0x1d4 C [_pywrap_tensorflow_internal.so+0x562fb17] _ZN10tensorflow14EagerOperation7ExecuteEN4absl12lts_202103244SpanIPNS_20AbstractTensorHandleEEEPi+0xb7 C [_pywrap_tensorflow_internal.so+0x5aefa79] _ZN10tensorflow21CustomDeviceOpHandler7ExecuteEPNS_27ImmediateExecutionOperationEPPNS_30ImmediateExecutionTensorHandleEPi+0x239 C [_pywrap_tensorflow_internal.so+0xe97245] TFE_Execute+0x45 C [_pywrap_tensorflow_internal.so+0x961c57] _Z24TFE_Py_ExecuteCancelableP11TFE_ContextPKcS2_PN4absl12lts_2021032413InlinedVectorIP16TFE_TensorHandleLm4ENSt3__19allocatorIS7_EEEEP7_objectP23TFE_CancellationManagerPNS5_IS7_Lm2ESA_EEP9TF_Status+0x267 C [_pywrap_tfe.so+0x4588] _ZN10tensorflow32TFE_Py_ExecuteCancelable_wrapperERKN8pybind116handleEPKcS5_S3_S3_PNS19CancellationManagerES3+0x98 C [_pywrap_tfe.so+0x38c7f] _ZZN8pybind1112cpp_function10initializeIZL25pybind11_initpywrap_tfeRNS_7module_EE4$_51NS_6objectEJRKNS_6handleEPKcSA_S8_S8_S8_EJNS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8invokeESR_+0xcf C [_pywrap_tfe.so+0x19b4e] _ZN8pybind1112cpp_function10dispatcherEP7_objectS2S2+0xc5e C [Python+0x225ed] _PyMethodDef_RawFastCallKeywords+0x2ad C [Python+0x21a5a] _PyCFunction_FastCallKeywords+0x2a C [Python+0xe05a4] call_function+0x2d4 C [Python+0xdd576] _PyEval_EvalFrameDefault+0x6266 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x21a21] _PyFunction_FastCallKeywords+0x101 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd6bd] _PyEval_EvalFrameDefault+0x63ad C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x21a21] _PyFunction_FastCallKeywords+0x101 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd6bd] _PyEval_EvalFrameDefault+0x63ad C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x21a21] _PyFunction_FastCallKeywords+0x101 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd6bd] _PyEval_EvalFrameDefault+0x63ad C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x215fb] _PyFunction_FastCallDict+0x20b C [Python+0x228cf] _PyObject_Call_Prepend+0x8f C [Python+0x6f916] slot_tp_call+0x96 C [Python+0x21b87] PyObject_Call+0x87 C [Python+0xdd887] _PyEval_EvalFrameDefault+0x6577 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x215fb] _PyFunction_FastCallDict+0x20b C [Python+0x228cf] _PyObject_Call_Prepend+0x8f C [Python+0x21b87] PyObject_Call+0x87 C [Python+0xdd887] _PyEval_EvalFrameDefault+0x6577 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x215fb] _PyFunction_FastCallDict+0x20b C [Python+0xdd887] _PyEval_EvalFrameDefault+0x6577 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x215fb] _PyFunction_FastCallDict+0x20b C [Python+0x228cf] _PyObject_Call_Prepend+0x8f C [Python+0x6f916] slot_tp_call+0x96 C [Python+0x21871] _PyObject_FastCallKeywords+0x1b1 C [Python+0xe0474] call_function+0x1a4 C [Python+0xdd576] _PyEval_EvalFrameDefault+0x6266 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x215fb] _PyFunction_FastCallDict+0x20b C [Python+0xdd887] _PyEval_EvalFrameDefault+0x6577 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x21a21] _PyFunction_FastCallKeywords+0x101 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd6bd] _PyEval_EvalFrameDefault+0x63ad C [Python+0x21e90] function_code_fastcall+0x80 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd617] _PyEval_EvalFrameDefault+0x6307 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0x21a21] _PyFunction_FastCallKeywords+0x101 C [Python+0xe05b2] call_function+0x2e2 C [Python+0xdd576] _PyEval_EvalFrameDefault+0x6266 C [Python+0xe10d6] _PyEval_EvalCodeWithName+0x976 C [Python+0xd7234] PyEval_EvalCode+0x64 C [Python+0x114a57] PyRun_StringFlags+0x97 C [libjep.jnilib+0x11bb5] pyembed_getvalue+0x75 C [libjep.jnilib+0x132a6] Java_jep_Jep_getValue+0x36 j jep.Jep.getValue(JLjava/lang/String;Ljava/lang/Class;)Ljava/lang/Object;+0 j jep.Jep.getValue(Ljava/lang/String;)Ljava/lang/Object;+12 j com.roberta.BertTrainer.train(Ljava/util/List;Ljava/util/List;Ljava/util/Map;Ljava/util/List;Ljava/util/List;)V+114 j com.roberta.TrainerRunner.run([Ljava/lang/String;)V+394 j com.roberta.DemoApplication.run([Ljava/lang/String;)V+4 j org.springframework.boot.SpringApplication.callRunner(Lorg/springframework/boot/CommandLineRunner;Lorg/springframework/boot/ApplicationArguments;)V+7 j org.springframework.boot.SpringApplication.callRunners(Lorg/springframework/context/ApplicationContext;Lorg/springframework/boot/ApplicationArguments;)V+119 j org.springframework.boot.SpringApplication.run([Ljava/lang/String;)Lorg/springframework/context/ConfigurableApplicationContext;+164 j org.springframework.boot.SpringApplication.run([Ljava/lang/Class;[Ljava/lang/String;)Lorg/springframework/context/ConfigurableApplicationContext;+9 j org.springframework.boot.SpringApplication.run(Ljava/lang/Class;[Ljava/lang/String;)Lorg/springframework/context/ConfigurableApplicationContext;+9 j com.roberta.DemoApplication.main([Ljava/lang/String;)V+3 v ~StubRoutines::call_stub V [libjvm.dylib+0x3bf81a] _ZN9JavaCalls11call_helperEP9JavaValueRK12methodHandleP17JavaCallArgumentsP6Thread+0x220 V [libjvm.dylib+0x40341c] _ZL17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread+0x122 V [libjvm.dylib+0x40623b] jni_CallStaticVoidMethod+0x17f C [libjli.dylib+0x4831] JavaMain+0xad1 C [libjli.dylib+0x6be4] ThreadJavaMain+0x9 C [libsystem_pthread.dylib+0x6950] _pthread_start+0xe0 C [libsystem_pthread.dylib+0x247b] thread_start+0xf

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j jep.Jep.getValue(JLjava/lang/String;Ljava/lang/Class;)Ljava/lang/Object;+0 j jep.Jep.getValue(Ljava/lang/String;)Ljava/lang/Object;+12 j com.roberta.BertTrainer.train(Ljava/util/List;Ljava/util/List;Ljava/util/Map;Ljava/util/List;Ljava/util/List;)V+114 j com.roberta.TrainerRunner.run([Ljava/lang/String;)V+394 j com.roberta.DemoApplication.run([Ljava/lang/String;)V+4

Environment :