pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.39k stars 228 forks source link

Android app - Loading model - Failed parsing tensor at index 0: 0x12 #2163

Closed adonnini closed 1 month ago

adonnini commented 3 months ago

When I attempt to load the model file produced with my training module (using the executorch code), execution fails when the app tries to load the model. Below, you will find the relevant portion of the error log.

Below, you will also find the code I use to produce the lowered module.

Please let me know if you need additional information, and what I should do to resolve this issue.

Thanks

ERROR LOG ATTEMPTING TO LOAD LOWERED MODEL

02-28 16:07:12.089: I/ETLOG(8314): Model file /data/user/0/com.android.contextq/files/locationInformation/tpt_delegate.pte is loaded.
02-28 16:07:12.089: I/ETLOG(8314): Setting up planned buffer 0, size 31460272.
02-28 16:07:12.101: I/ETLOG(8314): Constant buffer 1 out of program buffer range 0
02-28 16:07:12.101: I/ETLOG(8314): getTensorDataPtr() failed: 0x12
02-28 16:07:12.101: I/ETLOG(8314): Failed parsing tensor at index 0: 0x12
02-28 16:07:12.101: I/ETLOG(8314): In function CheckOk(), assert failed: hasValue_

CODE USED TO PRODUCE LOWERED MODULE

pre_autograd_aten_dialect = capture_pre_autograd_graph(m, (enc_input, dec_input, dec_source_mask, dec_target_mask))
aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask), strict=False)
edge_program: EdgeProgramManager = to_edge(aten_dialect)
to_be_lowered_module = edge_program.exported_program()

from executorch.exir.backend.backend_api import LoweredBackendModule, to_backend

lowered_module = edge_program.to_backend(XnnpackPartitioner())

save_path = save_path = "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/models/tpt_delegate.pte"
with open(save_path, "wb") as f:
    f.write(lowered_module.to_executorch().buffer)
kirklandsign commented 3 months ago

Hi @adonnini,

Seems to be from https://github.com/pytorch/executorch/blob/main/runtime/executor/program.cpp#L316 but I can't see the exact log line. Could you please try this from main branch, and run ./install_requirements.sh to make sure that we have the consistent pytorch and executorch version?

adonnini commented 3 months ago

@kirklandsign I cloned the main executorch branch and performed all the steps in https://pytorch.org/executorch/stable/getting-started-setup.html Everything seemed to work as expected. Here is the result of ./install_requirements.sh

Successfully installed MarkupSafe-2.1.3 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.2.1 networkx-3.2.1 sympy-1.12 torch-2.3.0.dev20240229+cpu torchaudio-2.2.0.dev20240229+cpu typing-extensions-4.8.0

next, I ran my training module, including the executorch related code to build the lowered model.

The model file was produced successfully.

However when I attempted to load the lowered model and run it for inference from my Android application, execution failed with the same error. Please find the entire error log below.

Please let me know if you need addtional information, and what I should do next. Thanks

ERROR LOG ATTEMPTING TO LOAD LOWERED MODEL

03-09 12:25:35.463: I/ETLOG(7403): Model file /data/user/0/com.android.contextq/files/locationInformation/tpt_delegate.pte is loaded.
03-09 12:25:35.463: I/ETLOG(7403): Setting up planned buffer 0, size 31460272.
03-09 12:25:35.475: I/ETLOG(7403): Constant buffer 1 out of program buffer range 0
03-09 12:25:35.475: I/ETLOG(7403): getTensorDataPtr() failed: 0x12
03-09 12:25:35.475: I/ETLOG(7403): Failed parsing tensor at index 0: 0x12
03-09 12:25:35.475: I/ETLOG(7403): In function CheckOk(), assert failed: hasValue_
03-09 12:25:35.476: A/libc(7403): Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 7432 (Thread-2), pid 7403 (lNetworkService)
03-09 12:25:35.528: I/crash_dump64(7914): obtaining output fd from tombstoned, type: kDebuggerdTombstoneProto
03-09 12:25:35.529: I/tombstoned(707): received crash request for pid 7432
03-09 12:25:35.530: I/crash_dump64(7914): performing dump of process 7403 (target tid = 7432)
03-09 12:25:35.762: A/DEBUG(7914): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-09 12:25:35.762: A/DEBUG(7914): Build fingerprint: 'Fairphone/FP4eea/FP4:13/TKQ1.230127.002/TP29:user/release-keys'
03-09 12:25:35.762: A/DEBUG(7914): Revision: '0'
03-09 12:25:35.762: A/DEBUG(7914): ABI: 'arm64'
03-09 12:25:35.762: A/DEBUG(7914): Timestamp: 2024-03-09 12:25:35.541386611+0100
03-09 12:25:35.762: A/DEBUG(7914): Process uptime: 366s
03-09 12:25:35.762: A/DEBUG(7914): Cmdline: com.android.contextq:ContextQNeuralNetworkService
03-09 12:25:35.762: A/DEBUG(7914): pid: 7403, tid: 7432, name: Thread-2  >>> com.android.contextq:ContextQNeuralNetworkService <<<
03-09 12:25:35.762: A/DEBUG(7914): uid: 10207
03-09 12:25:35.762: A/DEBUG(7914): signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
03-09 12:25:35.762: A/DEBUG(7914):     x0  0000000000000000  x1  0000000000001d08  x2  0000000000000006  x3  0000006cf2f91320
03-09 12:25:35.762: A/DEBUG(7914):     x4  60651f7371647272  x5  60651f7371647272  x6  60651f7371647272  x7  7f7f7f7f7f7f7f7f
03-09 12:25:35.762: A/DEBUG(7914):     x8  00000000000000f0  x9  000000706db5cb28  x10 0000000000000001  x11 000000706db9c84c
03-09 12:25:35.762: A/DEBUG(7914):     x12 0000006cf2f8f8f0  x13 0000000000000030  x14 0000006cf2f90c38  x15 0000000034155555
03-09 12:25:35.762: A/DEBUG(7914):     x16 000000706dc04d68  x17 000000706dbe04e0  x18 0000006cf2448000  x19 0000000000001ceb
03-09 12:25:35.762: A/DEBUG(7914):     x20 0000000000001d08  x21 00000000ffffffff  x22 0000007080487870  x23 0000007080487870
03-09 12:25:35.762: A/DEBUG(7914):     x24 0000006cf2f91cc0  x25 b400006eb61db0d0  x26 0000000000002071  x27 0000007080487850
03-09 12:25:35.762: A/DEBUG(7914):     x28 0000006cf2f91b90  x29 0000006cf2f913a0
03-09 12:25:35.762: A/DEBUG(7914):     lr  000000706db8d788  sp  0000006cf2f91300  pc  000000706db8d7b4  pst 0000000000001000
03-09 12:25:35.762: A/DEBUG(7914): backtrace:
03-09 12:25:35.762: A/DEBUG(7914):       #00 pc 00000000000527b4  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168) (BuildId: bf5f1ce73f89cca7d6a062eb7877e86a)
03-09 12:25:35.762: A/DEBUG(7914):       #01 pc 0000000000b95590  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (et_pal_abort+8) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #02 pc 0000000000b95398  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (torch::executor::runtime_abort()+8) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #03 pc 0000000000b71c5c  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (torch::executor::Result<torch::executor::Method>::CheckOk() const+152) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #04 pc 0000000000b6e110  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (torch::executor::Result<torch::executor::Method>::get()+24) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #05 pc 0000000000b6bcf4  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (executorch_jni::ExecuTorchJni::ExecuTorchJni(facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >)+2000) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #06 pc 0000000000b6b28c  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (facebook::jni::basic_strong_ref<facebook::jni::detail::HybridData, facebook::jni::LocalReferenceAllocator> facebook::jni::HybridClass<executorch_jni::ExecuTorchJni, facebook::jni::detail::BaseHybridClass>::makeCxxInstance<facebook::jni::alias_ref<_jstring*>&, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >&>(facebook::jni::alias_ref<_jstring*>&, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >&)+128) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #07 pc 0000000000b6b07c  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (executorch_jni::ExecuTorchJni::initHybrid(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >)+52) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #08 pc 0000000000b725a4  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (facebook::jni::detail::CallWithJniConversions<facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator> (*)(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >), facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator>, _jclass*, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> > >::call(_jclass*, _jstring*, facebook::jni::detail::JTypeFor<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString>, facebook::jni::JObject, void>::_javaobject*, facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator> (*)(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >))+136) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #09 pc 0000000000b6b128  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (facebook::jni::detail::FunctionWrapper<facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator> (*)(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >), _jclass*, facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> > >::call(_JNIEnv*, _jobject*, _jstring*, facebook::jni::detail::JTypeFor<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString>, facebook::jni::JObject, void>::_javaobject*, facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator> (*)(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >))+72) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #10 pc 0000000000b6a718  /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!libexecutorchdemo.so (facebook::jni::detail::FunctionWrapperWithJniEntryPoint<facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator> (*)(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >), &(executorch_jni::ExecuTorchJni::initHybrid(facebook::jni::alias_ref<_jclass*>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> >)), _jclass*, facebook::jni::basic_strong_ref<facebook::jni::detail::JTypeFor<facebook::jni::detail::HybridData, facebook::jni::JObject, void>::_javaobject*, facebook::jni::LocalReferenceAllocator>, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString> > >::call(_JNIEnv*, _jobject*, _jstring*, facebook::jni::detail::JTypeFor<facebook::jni::JMap<facebook::jni::JString, facebook::jni::JString>, facebook::jni::JObject, void>::_javaobject*)+52) (BuildId: 8065dc692f8e345f80fe49a1f2162d7e784b3499)
03-09 12:25:35.762: A/DEBUG(7914):       #11 pc 0000000000355630  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #12 pc 000000000033ee80  /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+640) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #13 pc 0000000000512ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+2364) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #14 pc 00000000004961dc  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+1892) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #15 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #16 pc 0000000000042d7c  [anon:dalvik-classes11.dex extracted in memory from /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!classes11.dex] (com.example.executorchdemo.executor.NativePeer.<init>+0)
03-09 12:25:35.762: A/DEBUG(7914):       #17 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #18 pc 00000000005136f0  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #19 pc 00000000004962a4  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+2092) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #20 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #21 pc 0000000000042ce4  [anon:dalvik-classes11.dex extracted in memory from /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!classes11.dex] (com.example.executorchdemo.executor.Module.load+0)
03-09 12:25:35.762: A/DEBUG(7914):       #22 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #23 pc 00000000005136f0  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #24 pc 00000000004961dc  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+1892) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #25 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #26 pc 0000000000042cc8  [anon:dalvik-classes11.dex extracted in memory from /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!classes11.dex] (com.example.executorchdemo.executor.Module.load+0)
03-09 12:25:35.762: A/DEBUG(7914):       #27 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #28 pc 00000000005136f0  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #29 pc 00000000004961dc  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+1892) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #30 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #31 pc 000000000000d530  [anon:dalvik-classes15.dex extracted in memory from /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!classes15.dex] (com.android.contextq.neuralnetwork.NeuralNetworkService.neuralNetworkloadAndRunPytorch+0)
03-09 12:25:35.762: A/DEBUG(7914):       #32 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #33 pc 00000000005136f0  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #34 pc 00000000004961dc  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+1892) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #35 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #36 pc 0000000000007d4c  [anon:dalvik-classes15.dex extracted in memory from /data/app/~~nuZwb5GEVeE5LCK-o_vC0g==/com.android.contextq-VusjC44pPRp2s2DHEvQm0g==/base.apk!classes15.dex] (com.android.contextq.neuralnetwork.NeuralNetworkService$NeuralNetworkServiceRunnable.run+0)
03-09 12:25:35.762: A/DEBUG(7914):       #37 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #38 pc 00000000005136f0  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #39 pc 0000000000496d1c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+4772) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #40 pc 0000000000357dd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #41 pc 000000000000308c  [anon:dalvik-/apex/com.android.art/javalib/core-oj.jar-transformed] (java.lang.Thread.run+0)
03-09 12:25:35.762: A/DEBUG(7914):       #42 pc 0000000000371b14  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.8722505846101882172)+232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #43 pc 000000000037140c  /apex/com.android.art/lib64/libart.so (artQuickToInterpreterBridge+964) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #44 pc 0000000000355768  /apex/com.android.art/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #45 pc 000000000033eba4  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+612) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #46 pc 000000000023a9ac  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+144) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #47 pc 000000000053b96c  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+1600) (BuildId: 02bec5940be704b863f6514fc7d81c41)
03-09 12:25:35.762: A/DEBUG(7914):       #48 pc 00000000000ba650  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: bf5f1ce73f89cca7d6a062eb7877e86a)
03-09 12:25:35.762: A/DEBUG(7914):       #49 pc 0000000000053ffc  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: bf5f1ce73f89cca7d6a062eb7877e86a)
03-09 12:25:35.774: E/tombstoned(707): Tombstone written to: tombstone_12
kimishpatel commented 3 months ago

This error I have seen elsewhere sa well. Maybe Chen? I dont know Chen;s handle. @iseeyuan ?

cccclai commented 3 months ago

what does the graph look like after you run to_backend?

adonnini commented 3 months ago

@cccclai Forgive my ignorance. What do you mean exactly when you say: what does the graph look like after you run to_backend? ? Thanks

cccclai commented 3 months ago

what does you code look like around call-site to_backend?

adonnini commented 3 months ago

Here is my code:

pre_autograd_aten_dialect = capture_pre_autograd_graph(m, (enc_input, dec_input, dec_source_mask, dec_target_mask))
aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask), strict=False)
edge_program: EdgeProgramManager = to_edge(aten_dialect)
to_be_lowered_module = edge_program.exported_program()

from executorch.exir.backend.backend_api import LoweredBackendModule, to_backend

lowered_module = edge_program.to_backend(XnnpackPartitioner())

save_path = save_path = "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/models/tpt_delegate.pte"
with open(save_path, "wb") as f:
    f.write(lowered_module.to_executorch().buffer)
adonnini commented 2 months ago

@cccclai After looking at the code I use to produce the lowered module (is this what you were looking for?), what do you think? Is there anything I can do next to help resolve this issue? Please let me know if you need any additional information. Thanks

adonnini commented 2 months ago

@cccclai Sorry to bug you again about this. Did you take a look at the code I use to produce the lowered model? What should I do next? Until this problem is resolved I am stuck. Please let me know. Thanks

adonnini commented 2 months ago

@cccclai I installed executorch from main branch and produced the lowered model. The size of the file was reduced by about half. Unfortunately, the outcome was the same (see below).

I realize yoyu must be pretty busy and have higher priorities. I hope you can take a look at this issue soon.

Thanks

ERROR LOG

04-07 16:21:42.642: I/ETLOG(25729): Model file /data/user/0/com.android.contextq/files/locationInformation/tpt_delegate.pte is loaded.
04-07 16:21:42.643: I/ETLOG(25729): Setting up planned buffer 0, size 29448544.
04-07 16:21:42.661: I/ETLOG(25729): Constant buffer 1 out of program buffer range 0
04-07 16:21:42.661: I/ETLOG(25729): getTensorDataPtr() failed: 0x12
04-07 16:21:42.661: I/ETLOG(25729): Failed parsing tensor at index 0: 0x12
04-07 16:21:42.661: I/ETLOG(25729): In function CheckOk(), assert failed: hasValue_
kimishpatel commented 1 month ago

@kirklandsign @cccclai any updates on this?

kirklandsign commented 1 month ago
getTensorDataPtr() failed: 0x12

InvalidArgument = 0x12

seems to be from https://github.com/pytorch/executorch/blob/main/runtime/executor/tensor_parser_portable.cpp#L128 <- https://github.com/pytorch/executorch/blob/main/runtime/executor/tensor_parser_exec_aten.cpp#L57 <- https://github.com/pytorch/executorch/blob/main/runtime/executor/program.cpp#L326

Constant buffer 1 out of program buffer range 0

@lucylq do you have idea about what happened?

adonnini commented 1 month ago

@lucyq I hope I am not bothering you. When do you think you could let me know about potential resolution for this problem? I am stuck waiting for its resolution. Please let me know if there is anything I can do to help. Thanks

adonnini commented 1 month ago

@lucylq Please let me know if/when you will be able to spend time working on this issue so that I will stop bothering you and can plan accordingly. Please respond. Thanks

lucylq commented 1 month ago

Hey @adonnini, thanks for your patience. Could you try from the instructions here? https://pytorch.org/executorch/main/getting-started-setup.html This clones the main executorch branch. I wonder if you cloned the preview branch earlier, from the stable page, as the error log you're seeing has been replaced and there's some new logic to handle constants now.

adonnini commented 1 month ago

@lucylq Thanks for getting back to me. I did as you asked installing executorch (0.2.0a0+4f79832) following the instructions in https://pytorch.org/executorch/main/getting-started-setup.html Unfortunately, execution failed producing the error log reported below. Please let me know if you need any information, and what I should do next. Thanks

ERROR LOG

05-10 21:54:38.885: I/ETLOG(29318): Model file /data/user/0/com.android.contextq/files/locationInformation/tpt_delegate.pte is loaded.
05-10 21:54:38.885: I/ETLOG(29318): Setting up planned buffer 0, size 29448544.
05-10 21:54:38.898: I/ETLOG(29318): Constant buffer 1 out of program buffer range 0
05-10 21:54:38.898: I/ETLOG(29318): getTensorDataPtr() failed: 0x12
05-10 21:54:38.898: I/ETLOG(29318): Failed parsing tensor at index 0: 0x12
05-10 21:54:38.898: I/ETLOG(29318): In function CheckOk(), assert failed: hasValue_
kirklandsign commented 1 month ago

Hi @adonnini I feel a bit strange because that log line

Constant buffer 1 out of program buffer range 0

Should be gone after https://github.com/pytorch/executorch/pull/1369 Otherwise, would you mind do a grep out of program buffer range? It should be gone in your repo.

So is it possible that the app is not using the latest libexecutorch_jni.so you built just now? We need to use the latest runtime libraries as well.

adonnini commented 1 month ago

@kirklandsign Thanks. There is no out of program buffer range as far as I can tell. I think you are right. The problem probably lies with my use of libexecutorch_jni.so. In my android app I still use libexecutorchdemo.so from the very first install of executorch last year when I had to work out a temporary solution in order to use it as libexecutorchdemo.so is hardwired to be used only with the executorch Android demo app.

I realized that I needed to go through this https://pytorch.org/executorch/stable/demo-apps-android.html again (sorry I should have realized that). I did. Everything worked as expected. By the way, the documentation which was good to start with has gotten even better!

So, now my problem is how I can use libexecutorch_jni.so in my app. Is it still hardwired to be used by the executorch Android app only as it was previously? In other words, other than placing it in app/jniLibs/arm64-v8a/ what are the instructions for using it in an Android app? With the 0.1.0 release of executorch I had to reproduce examples/demo-apps/android/ExecuTorchDemo/app/src/main/java/com/example/executorchdemo in my app and make a few other changes. Based on my initial look, the Android demo app in the latest executorch release is structured completely differently.

I have to add the org.pytorch.executorch package to my app. However that is not enough. At this point, making it work with an app which is not the executorch Android demo app requires a hack, I think. Not good.

For example, in java/org/pytorch/executorch/NativePeer.java I am getting this error: Cannot resolve corresponding JNI function Java_org_pytorch_executorch_NativePeer_initHybrid. even though the executorch library libexecutorch.so is in the /jniLibs/arm64-v8a/. One potential solution is to suppress this. However, that's not really a solution and may simply hide the problem.

Sorry for these questions. I should have realized that I needed to go through the Android related set-up again sooner. It would be useful if the documentation at some point included instructions for using the executorch runtime with an Android app which is not the executorch demo app.

Thanks

adonnini commented 1 month ago

@kirklandsign an update. After making some adjustments mostly based on (educated) guesswork, I was able to attempt to run the lowered model for inference in my Android application. Loading of the model appeared to work. However, not surprisingly, execution of the inference step failed producing the error log reported below.

The failure occurred when attempting to execute the following call: outputTensor = mModule.forward(EValue.from(arrDataPytorch))[0].toTensor(); I constructed the above line duplicating the forward call found in MainActivity.java in the executorch Android demo application.

It's quite possible (likely?) that given the structure of arrDataPytorch in my code (which is a flattened tensor), the forward call I have in my code is not correct.

What do you (or any of your colleagues) think? Please let me know if you any information, and what I should do next.

Thanks

ERROR LOG

05-12 16:50:23.542: E/ExecuTorch(12402): Attempted to resize a static tensor to a new shape at dimension 0 old_size: 27 new_size: 14415
05-12 16:50:23.542: E/ExecuTorch(12402): Error setting input 0: 0x10
05-12 16:50:23.542: A/ExecuTorch(12402): In function execute_method(), assert failed (result.ok()): Execution of method forward failed with status 0x12
05-12 16:50:23.542: A/libc(12402): Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12434 (Thread-2), pid 12402 (lNetworkService)
05-12 16:50:23.597: I/crash_dump64(12837): obtaining output fd from tombstoned, type: kDebuggerdTombstoneProto
05-12 16:50:23.598: I/tombstoned(712): received crash request for pid 12434
05-12 16:50:23.603: I/crash_dump64(12837): performing dump of process 12402 (target tid = 12434)
05-12 16:50:23.728: W/adbd(9671): timeout expired while flushing socket, closing
05-12 16:50:23.829: A/DEBUG(12837): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
05-12 16:50:23.829: A/DEBUG(12837): Build fingerprint: 'Fairphone/FP4eea/FP4:13/TKQ1.230127.002/TP2D:user/release-keys'
05-12 16:50:23.829: A/DEBUG(12837): Revision: '0'
05-12 16:50:23.829: A/DEBUG(12837): ABI: 'arm64'
05-12 16:50:23.829: A/DEBUG(12837): Timestamp: 2024-05-12 16:50:23.608361388+0200
05-12 16:50:23.829: A/DEBUG(12837): Process uptime: 377s
05-12 16:50:23.829: A/DEBUG(12837): Cmdline: com.android.contextq:ContextQNeuralNetworkService
05-12 16:50:23.829: A/DEBUG(12837): pid: 12402, tid: 12434, name: Thread-2  >>> com.android.contextq:ContextQNeuralNetworkService <<<
05-12 16:50:23.829: A/DEBUG(12837): uid: 10207
05-12 16:50:23.829: A/DEBUG(12837): signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
05-12 16:50:23.829: A/DEBUG(12837): Abort message: 'In function execute_method(), assert failed (result.ok()): Execution of method forward failed with status 0x12'
05-12 16:50:23.829: A/DEBUG(12837):     x0  0000000000000000  x1  0000000000003092  x2  0000000000000006  x3  0000007970a42e30
05-12 16:50:23.829: A/DEBUG(12837):     x4  72601f2b2827636e  x5  72601f2b2827636e  x6  72601f2b2827636e  x7  7f7f7f7f7f7f7f7f
05-12 16:50:23.829: A/DEBUG(12837):     x8  00000000000000f0  x9  0000007d0a45ab28  x10 0000000000000001  x11 0000007d0a49a84c
05-12 16:50:23.829: A/DEBUG(12837):     x12 0000007970a41400  x13 000000000000006f  x14 0000007970a42748  x15 0000000034155555
05-12 16:50:23.829: A/DEBUG(12837):     x16 0000007d0a502d68  x17 0000007d0a4de4e0  x18 000000796fac0000  x19 0000000000003072
05-12 16:50:23.829: A/DEBUG(12837):     x20 0000000000003092  x21 00000000ffffffff  x22 0000007cfd41da00  x23 0000007cfd41da00
05-12 16:50:23.829: A/DEBUG(12837):     x24 0000007970a435b0  x25 b400007b3371b560  x26 0000000000002072  x27 0000007cfd9143e8
05-12 16:50:23.829: A/DEBUG(12837):     x28 0000007970a43480  x29 0000007970a42eb0
05-12 16:50:23.829: A/DEBUG(12837):     lr  0000007d0a48b788  sp  0000007970a42e10  pc  0000007d0a48b7b4  pst 0000000000001000
adonnini commented 1 month ago

@kirklandsign I think the problem I reported above may be (similar to) this https://github.com/pytorch/executorch/issues/1350

kirklandsign commented 1 month ago

Hi @adonnini, thank you so much for trying it out! So far, unfortunately, we need to bundle two separate parts into the app: the java (jar) part, which should be in extension/android/build/libs/executorch.jar, and jni library part, which should be in cmake-out-android/extension/android. You need to import executorch.jar to your gradle project, and the jni to JNI directory. You could also make an aar which combines those two (https://github.com/pytorch/executorch/blob/e38eaec42e186fc62bddf3e11b14e90c0b64f8ba/build/test_android_ci.sh#L41-L54)

Now Attempted to resize a static tensor to a new shape at dimension 0 old_size: 27 new_size: 14415 seems to be a different issue. And I think it is related to dynamic shape issue as in https://github.com/pytorch/executorch/issues/1350

adonnini commented 1 month ago

@kirklandsign Thank you very much for the explanation and the information. I'll do as you describe. Yes, I think the new error is https://github.com/pytorch/executorch/issues/1350 I'll pick up that thread again and see what the current status is. Thanks

adonnini commented 1 month ago

@kirklandsign

I am a bit uncertain about the exact locations you refer to:

import executorch.jar to your gradle project, and the jni to source directory

I copied the jni library in my app into /app/src/main/jniLibs/arm64-v8a/libexecutorch.so. Is this not the right place?

Does executorch.jar go in /app/lib? Is there a dependency that needs to be added to app's build.gradle (I would think so)? Unless I am mistaken, currently the Android demo app does not use executorch.jar as it has the package org.pytorch.executorch included directly in the application as do I. There is no dependency for the jar in the demo application's build.gradle (unless I am mistaken).

After adding libexecutorch.so, and the org.pytorch.executorch, my application built successfully, and the model did load without problems before running into https://github.com/pytorch/executorch/issues/1350 .

kirklandsign commented 1 month ago

Hi @adonnini

Sorry I mean .so to JNI directory. So you need both jniLibs/arm64-v8a/libexecutorch.so and it's the correct path. If package org.pytorch.executorch is included directly in the application, you don't need executorch.jar

adonnini commented 1 month ago

@kirklandsign Thanks. I think we can close this issue. Agreed?

kirklandsign commented 1 month ago

Sounds good. Thank you!