pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.85k stars 302 forks source link

Segmentation Fault when implementing llama/stories110M Android phone deployment #4237

Open BESTTOOLBOX opened 2 months ago

BESTTOOLBOX commented 2 months ago

🐛 Describe the bug

I encountered a segmentation fault when implementing llama/stories110M Android phone deployment according to the #https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md. I put xnnpack_stories110M.pte, tokenizer.bin and llama_main compiled with xnn backend into the same directory and executed llama_main. The following is the error message: image Here is the logcat output: image The following is the information parsed by addr2line. It seems that there was an error in the memory allocation.

00 pc 0000000002ff1afc /data/local/tmp/jiaxgeng/stories/llama_main (xnn_pack_qs8_qc4w_gemm_bl_goi_w_nr8_kr4+2696) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471)

/local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/packing.c:548 07-12 10:11:23.542 11569 11569 F DEBUG : #01 pc 0000000002ff3604 /data/local/tmp/jiaxgeng/stories/llama_main (xnn_pack_qs8_qc4w_gemm_bl_goi_w+1512) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/packing.c:753 07-12 10:11:23.542 11569 11569 F DEBUG : #02 pc 00000000030e79bc /data/local/tmp/jiaxgeng/stories/llama_main (xnn_create_fully_connected_nc_qd8_f32_qb4w+3240) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/operators/fully-connected-nc.c:737 07-12 10:11:23.542 11569 11569 F DEBUG : #03 pc 00000000030e1acc /data/local/tmp/jiaxgeng/stories/llama_main (create_fully_connected_operator+3156) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/subgraph/fully-connected.c:237 image 07-12 10:11:23.542 11569 11569 F DEBUG : #04 pc 0000000002d4791c /data/local/tmp/jiaxgeng/stories/llama_main (xnn_create_runtime_v4+1644) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/runtime.c:575 image 07-12 10:11:23.542 11569 11569 F DEBUG : #05 pc 0000000002d47260 /data/local/tmp/jiaxgeng/stories/llama_main (xnn_create_runtime_v3+104) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/runtime.c:208 image 07-12 10:11:23.542 11569 11569 F DEBUG : #06 pc 0000000002d471e8 /data/local/tmp/jiaxgeng/stories/llama_main (xnn_create_runtime_v2+48) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/third-party/XNNPACK/src/runtime.c:193 image 07-12 10:11:23.542 11569 11569 F DEBUG : #07 pc 0000000000855d4c /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::xnnpack::delegate::XNNCompiler::compileModel(void const, unsigned long, torch::executor::xnnpack::delegate::XNNExecutor, torch::executor::MemoryAllocator)+1760) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/runtime/XNNCompiler.cpp:1681 image 07-12 10:11:23.542 11569 11569 F DEBUG : #08 pc 0000000000861c88 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::XnnpackBackend::init(torch::executor::BackendInitContext&, torch::executor::FreeableBuffer, torch::executor::ArrayRef) const+184) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/backends/xnnpack/runtime/XNNPACKBackend.cpp:42 image 07-12 10:11:23.542 11569 11569 F DEBUG : #09 pc 000000000312bbd0 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::BackendDelegate::Init(executorch_flatbuffer::BackendDelegate const&, torch::executor::Program const, torch::executor::BackendInitContext&, torch::executor::BackendDelegate)+992) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/runtime/executor/method.cpp:97 image 07-12 10:11:23.542 11569 11569 F DEBUG : #10 pc 000000000312a74c /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Method::init(executorch_flatbuffer::ExecutionPlan)+728) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/runtime/executor/method.cpp:596 image 07-12 10:11:23.542 11569 11569 F DEBUG : #11 pc 000000000312a298 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Method::load(executorch_flatbuffer::ExecutionPlan, torch::executor::Program const, torch::executor::MemoryManager, torch::executor::EventTracer)+84) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/runtime/executor/method.cpp:547 image 07-12 10:11:23.542 11569 11569 F DEBUG : #12 pc 00000000031345c4 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Program::load_method(char const, torch::executor::MemoryManager, torch::executor::EventTracer) const+380) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/runtime/executor/program.cpp:246 image 07-12 10:11:23.542 11569 11569 F DEBUG : #13 pc 0000000003118424 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Module::load_method(std::ndk1::basic_string<char, std::__ndk1::char_traits, std::ndk1::allocator > const&)+888) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/extension/module/module.cpp:131 image 07-12 10:11:23.542 11569 11569 F DEBUG : #14 pc 000000000086cd78 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Runner::load()+312) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/examples/models/llama2/runner/runner.cpp:73 image 07-12 10:11:23.542 11569 11569 F DEBUG : #15 pc 000000000086fc94 /data/local/tmp/jiaxgeng/stories/llama_main (torch::executor::Runner::generate(std::ndk1::basic_string<char, std::__ndk1::char_traits, std::ndk1::allocator > const&, int, std::ndk1::function<void (std::ndk1::basic_string<char, std::ndk1::char_traits, std::ndk1::allocator > const&)>, std::ndk1::function<void (torch::executor::Runner::Stats const&)>)+236) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/examples/models/llama2/runner/runner.cpp:351 image 07-12 10:11:23.542 11569 11569 F DEBUG : #16 pc 0000000000863fe8 /data/local/tmp/jiaxgeng/stories/llama_main (main+480) (BuildId: 76fcd4977dfd60ce58d60c76d5f70bf61f77b471) /local/mnt/workspace/executorch/examples/models/llama2/main.cpp:75 image 07-12 10:11:23.542 11569 11569 F DEBUG : #17 pc 0000000000053e48 /apex/com.android.runtime/lib64/bionic/libc.so (libc_init+108) (BuildId: 50118287324a156bc7be11d3d940c7be)

Here are my compile instructions for llama_main android. To see debug information, set type to debug. sudo cmake -DCMAKE_TOOLCHAIN_FILE=/local/mnt/workspace/android-ndk-r26d/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-33 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DCMAKE_BUILD_TYPE=Debug \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_ENABLE_LOGGING=1 \ -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -Bcmake-out-android .

cmake --build cmake-out-android -j16 --target install --config Debug

sudo cmake -DCMAKE_TOOLCHAIN_FILE=/local/mnt/workspace/android-ndk-r26d/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-33 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DCMAKE_BUILD_TYPE=Debug \ -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -Bcmake-out-android/examples/models/llama2 \ examples/models/llama2

sudo cmake --build cmake-out-android/examples/models/llama2 -j16 --config Debug

Versions

[pip3] executorch==0.4.0a0+8740c69 [pip3] numpy==2.0.0 [pip3] torch==2.5.0.dev20240618+cpu [pip3] torchao==0.1 [pip3] torchaudio==2.4.0.dev20240618+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240618+cpu [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.2.2 pypi_0 pypi [conda] torchaudio 2.2.2 pypi_0 pypi [conda] torchvision 0.17.2 pypi_0 pypi

lucylq commented 2 months ago

cc @kirklandsign for Android

kimishpatel commented 2 months ago

Wow. YOu did good chunk of work to narrow it down. Feels like the same issue that we saw earlier with out-of-bounds access for weights? cc: @digantdesai

kirklandsign commented 2 months ago

Hi @BESTTOOLBOX Thank you for reporting! This is a known issue and will be fixed by https://github.com/digantdesai/XNNPACK/pull/12 when we update the XNNPACK commit in ExecuTorch. You can also see https://github.com/pytorch/executorch/pull/4304/files for the fix working.

BESTTOOLBOX commented 2 months ago

Thank you very much. I have been confused about this issue for a long time. I will go check on the fix working.

Hi @BESTTOOLBOX Thank you for reporting! This is a known issue and will be fixed by digantdesai/XNNPACK#12 when we update the XNNPACK commit in ExecuTorch. You can also see https://github.com/pytorch/executorch/pull/4304/files for the fix working.

Wow. YOu did good chunk of work to narrow it down. Feels like the same issue that we saw earlier with out-of-bounds access for weights? cc: @digantdesai