Using the executorch AAR library with llava.pte in LlamaDemo results in "library 'libexecutorch.so' not found" on Android Studio

Astuary commented 3 days ago

🐛 Describe the bug

I am following Android Demo App instructions to run llava1.5 model (.pte file generated by following these instructions).

After generating executorch-llama.aar by running bash examples/demo-apps/android/LlamaDemo/setup.sh on a Linux server, I am pasting it in LlamaDemo/app/libs on a Windows machine for Android Studio. I am using Android NDK 26.3.11579264. Successfully running the LlamaDemo/setup.sh script prints

  adding: libs/executorch.jar (deflated 11%)
  adding: jni/arm64-v8a/ (stored 0%)
  adding: jni/arm64-v8a/libexecutorch.so (deflated 81%)
  adding: AndroidManifest.xml (deflated 23%)

in the end.

Then to run LlamaDemo on Pixel 8 (tried the default version with 8GB RAM and an edited version with 12GB RAM) emulator in Android Studio, once I try to load llava.pte and tokenizer.bin from the settings activity of the app, the app immediately crashes with this error:

Process: com.example.executorchllamademo, PID: 5333
java.lang.UnsatisfiedLinkError: dlopen failed: library "libexecutorch.so" not found
at java.lang.Runtime.loadLibrary0(Runtime.java:1081)
at java.lang.Runtime.loadLibrary0(Runtime.java:1003)
at java.lang.System.loadLibrary(System.java:1765)
at com.facebook.soloader.nativeloader.SystemDelegate.loadLibrary(SystemDelegate.java:24)
at com.facebook.soloader.nativeloader.NativeLoader.loadLibrary(NativeLoader.java:52)
at com.facebook.soloader.nativeloader.NativeLoader.loadLibrary(NativeLoader.java:30)
at org.pytorch.executorch.LlamaModule.<clinit>(LlamaModule.java:33)
at com.example.executorchllamademo.MainActivity.setLocalModel(MainActivity.java:125)
at com.example.executorchllamademo.MainActivity.access$000(MainActivity.java:54)
at com.example.executorchllamademo.MainActivity$1.run(MainActivity.java:198)
at java.lang.Thread.run(Thread.java:1012)

I have also tried to use the prebuilt executorch-llama.aar by running bash examples/demo-apps/android/LlamaDemo/download_prebuilt_lib.sh. I don't get any errors then, but the model is still not getting loaded. It stays on Loading model... message and when the RAM usage is at 7.5GB (for 8GB emulator variant) or 11.5GB (for 12GB emulator variant), nothing further happens.

Versions

Collecting environment information... PyTorch version: 2.6.0.dev20241007+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.30.4 Libc version: glibc-2.31

Python version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:27:36) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.4.0-190-generic-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA L40S Nvidia driver version: 535.183.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 57 bits virtual CPU(s): 64 On-line CPU(s) list: 0-31 Off-line CPU(s) list: 32-63 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 207 Model name: INTEL(R) XEON(R) GOLD 6526Y Stepping: 2 CPU MHz: 3500.000 BogoMIPS: 5600.00 Virtualization: VT-x L1d cache: 1.5 MiB L1i cache: 1 MiB L2 cache: 64 MiB L3 cache: 75 MiB NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear pconfig flush_l1d arch_capabilities

Versions of relevant libraries: [pip3] executorch==0.5.0a0+517fddb [pip3] numpy==1.26.4 [pip3] torch==2.6.0.dev20241007+cpu [pip3] torchao==0.5.0 [pip3] torchaudio==2.5.0.dev20241007+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241007+cpu [conda] executorch 0.5.0a0+517fddb pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241007+cpu pypi_0 pypi [conda] torchao 0.5.0 pypi_0 pypi [conda] torchaudio 2.5.0.dev20241007+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241007+cpu pypi_0 pypi

kirklandsign commented 2 days ago

Hi @Astuary seems that x86_64 variant is not added to the AAR.

kirklandsign commented 2 days ago

I'm not sure why in examples/demo-apps/android/LlamaDemo/setup.sh seems that build_android_native_library "x86_64" is not invoked

Astuary commented 2 days ago

So should I change export ANDROID_ABI=arm64-v8a to export ANDROID_ABI=x86_64? I tried that and there are lots of errors related to incompatibility with elf_x86_64:

[  7%] Linking CXX static library libeigen_blas.a
[ 11%] Built target subgraph
ld.lld: error: CMakeFiles/flatc.dir/src/idl_parser.cpp.o is incompatible with elf_x86_64
ld.lld: error: CMakeFiles/flatc.dir/src/idl_gen_text.cpp.o is incompatible with elf_x86_64
ld.lld: error: CMakeFiles/flatc.dir/src/reflection.cpp.o is incompatible with elf_x86_64
ld.lld: error: CMakeFiles/flatc.dir/src/util.cpp.o is incompatible with elf_x86_64
...

Also, I need the x86_64 because I am using an emulator? I will try the current aar library directly on an actual Pixel too then.

Astuary commented 2 days ago

Hello, I ended up using the downloaded AAR with both x86 and arm for my emulator, and the AAR I built for my actual phone. Although, I am still unable to load the llava model in memory.

It stays on Loading model... message and when the RAM usage is at 7.5GB (for 8GB of the actual phone) or 11.5GB (for 12GB the emulator), the app just shuts down/crashes.

kirklandsign commented 2 days ago

Hi @Astuary you need slightly more RAM than the model requires. Android OS will kill the app when the memory is low.

kirklandsign commented 2 days ago

Also, I need the x86_64 because I am using an emulator? I will try the current aar library directly on an actual Pixel too then.

x86_64 is used for emulator if your hypervisor is x86_64.

kirklandsign commented 2 days ago

I tried that and there are lots of errors related to incompatibility with elf_x86_64:

You need to clean cmake-out beacuse different ABI can't live in the same directory

kirklandsign commented 2 days ago

So should I change export ANDROID_ABI=arm64-v8a to export ANDROID_ABI=x86_64

I guess if you don't set ANDROID_ABI, it builds both by default

kirklandsign commented 2 days ago

Hi @Astuary Please let me know if you have other issues.

examples/demo-apps/android/LlamaDemo/app/libs/executorch-llama.aar should contain both ABIs.

For the RAM requirement, please increase RAM on emulator, or file another issue if still not working.

pytorch / executorch

Using the executorch AAR library with llava.pte in LlamaDemo results in "library 'libexecutorch.so' not found" on Android Studio #6301

🐛 Describe the bug

Versions