pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
2.17k stars 357 forks source link

Llama3 android sample error "Load failed:20" #4035

Closed glbrighenti closed 4 months ago

glbrighenti commented 4 months ago

I was able to quantize and build llamma3 for Android The model is working inside the shell using the generated _llama3_kv_sdpa_xnn_qe_432.pte and the original tokenizer.model

However when trying to use the sample LlammaDemo App, I get the "Load failed:20" after pointing to the same files it worked inside of the shell

I used the the script to download the AAR prebuild (download_prebuilt_lib.sh)

any ideas?

cccclai commented 4 months ago

Is there more logs in addition to Load failed:20? There is a config in cmake to enable logging

glbrighenti commented 4 months ago

I already built executorch with

-DEXECUTORCH_ENABLE_LOGGING=1 \

but I dont see anything relevant on Logcat, just typical lifecycle stuff:

from logcat:

2023-08-30 09:15:36.852  5338-5338  WindowOnBackDispatcher  com.example.executorchllamademo      W  sendCancelIfRunning: isInProgress=falsecallback=android.view.ViewRootImpl$$ExternalSyntheticLambda17@5505a56
2023-08-30 09:15:36.852  2761-3521  CoreBackPreview         system_server                        D  Window{94205d6 u0 com.example.executorchllamademo/com.example.executorchllamademo.MainActivity}: Setting back callback null
2023-08-30 09:15:36.860  2761-9524  InputManager-JNI        system_server                        W  Input channel object '94205d6 com.example.executorchllamademo/com.example.executorchllamademo.MainActivity (client)' was disposed without first being removed with the input manager!
2023-08-30 09:15:36.879  5338-5338  InputEventReceiver      com.example.executorchllamademo      W  Attempted to finish an input event but the input event receiver has already been disposed.
2023-08-30 09:15:36.882  2761-9524  CoreBackPreview         system_server                        D  Window{419970f u0 com.example.executorchllamademo/com.example.executorchllamademo.MainActivity}: Setting back callback OnBackInvokedCallbackInfo{mCallback=android.window.IOnBackInvokedCallback$Stub$Proxy@f3d2321, mPriority=0, mIsAnimationCallback=false}
2023-08-30 09:15:36.892  5338-5366  qdgralloc               com.example.executorchllamademo      W  getInterlacedFlag: getMetaData returned 3, defaulting to interlaced_flag = 0
2023-08-30 09:15:36.895  2761-9524  ImeTracker              system_server                        I  com.example.executorchllamademo:eb998f3: onRequestHide at ORIGIN_SERVER_HIDE_INPUT reason HIDE_UNSPECIFIED_WINDOW
2023-08-30 09:15:36.895  2761-9524  ImeTracker              system_server                        I  com.example.executorchllamademo:eb998f3: onCancelled at PHASE_SERVER_SHOULD_HIDE
2023-08-30 09:15:36.897  4263-4263  GoogleInpu...hodService com...gle.android.inputmethod.latin  I  GoogleInputMethodService.onStartInput():1917 onStartInput(EditorInfo{inputType=0x0(NULL) imeOptions=0x0 privateImeOptions=null actionName=UNSPECIFIED actionLabel=null actionId=0 initialSelStart=-1 initialSelEnd=-1 initialCapsMode=0x0 hintText=null label=null packageName=com.example.executorchllamademo fieldId=0 fieldName=null extras=null hintLocales=[]}, false)
2023-08-30 09:15:36.897  5338-5366  qdgralloc               com.example.executorchllamademo      W  getInterlacedFlag: getMetaData returned 3, defaulting to interlaced_flag = 0
2023-08-30 09:15:36.915  5338-5366  qdgralloc               com.example.executorchllamademo      W  getInterlacedFlag: getMetaData returned 3, defaulting to interlaced_flag = 0
2023-08-30 09:15:37.482  5338-5338  WindowOnBackDispatcher  com.example.executorchllamademo      W  sendCancelIfRunning: isInProgress=falsecallback=android.view.ViewRootImpl$$ExternalSyntheticLambda17@7628b8e
2023-08-30 09:15:37.483  2761-9524  CoreBackPreview         system_server                        D  Window{419970f u0 com.example.executorchllamademo/com.example.executorchllamademo.MainActivity}: Setting back callback null
2023-08-30 09:15:37.488  2761-27299 InputManager-JNI        system_server                        W  Input channel object '419970f com.example.executorchllamademo/com.example.executorchllamademo.MainActivity (client)' was disposed without first being removed with the input manager!
2023-08-30 09:15:37.491  5338-5411  libc                    com.example.executorchllamademo      W  Access denied finding property "ro.hardware.chipname"
2023-08-30 09:15:37.493  5338-5338  InputEventReceiver      com.example.executorchllamademo      W  Attempted to finish an input event but the input event receiver has already been disposed.
2023-08-30 09:15:37.507  2761-9524  ImeTracker              system_server                        I  com.example.executorchllamademo:5ce05139: onRequestHide at ORIGIN_SERVER_HIDE_INPUT reason HIDE_SAME_WINDOW_FOCUSED_WITHOUT_EDITOR
2023-08-30 09:15:37.507  2761-9524  ImeTracker              system_server                        I  com.example.executorchllamademo:5ce05139: onCancelled at PHASE_SERVER_SHOULD_HIDE
2023-08-30 09:15:37.508  4263-4263  GoogleInpu...hodService com...gle.android.inputmethod.latin  I  GoogleInputMethodService.onStartInput():1917 onStartInput(EditorInfo{inputType=0x0(NULL) imeOptions=0x0 privateImeOptions=null actionName=UNSPECIFIED actionLabel=null actionId=0 initialSelStart=-1 initialSelEnd=-1 initialCapsMode=0x0 hintText=null label=null packageName=com.example.executorchllamademo fieldId=0 fieldName=null extras=null hintLocales=[]}, false)
2023-08-30 09:15:40.395  2761-27299 CoreBackPreview         system_server                        D  Window{41ccca6 u0 com.example.executorchllamademo/com.example.executorchllamademo.MainActivity}: Setting back callback OnBackInvokedCallbackInfo{mCallback=android.window.IOnBackInvokedCallback$Stub$Proxy@c569400, mPriority=0, mIsAnimationCallback=false}
2023-08-30 09:15:40.401  5338-5366  qdgralloc               com.example.executorchllamademo      W  getInterlacedFlag: getMetaData returned 3, defaulting to interlaced_flag = 0
cccclai commented 4 months ago

Could you try the adb shell option just for sanity check before using the Android app?

Also wonder if running the model on host work

glbrighenti commented 4 months ago

Didn't try on the host machine, but its running well on on adb shell (log on galaxy s24 posted below)

Any idea what code 20 stands for? Its coming on the load model native part, but I cant seem to find any reference to what this errors stands for.

130|e3q:/data/local/tmp/llama $ ./llama_main -model_path llama.pte --tokenizer_path tokenizer.model --prompt \"Once upon a time\" --seq_len 120                                                                                                        
I 00:00:00.004786 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.005151 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.005199 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.005312 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
I 00:00:00.005370 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu2/regs/identification/midr_el1
I 00:00:00.005412 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu3/regs/identification/midr_el1
I 00:00:00.005511 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu4/regs/identification/midr_el1
I 00:00:00.005548 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu5/regs/identification/midr_el1
I 00:00:00.005584 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu6/regs/identification/midr_el1
I 00:00:00.005639 executorch:cpuinfo_utils.cpp:91] Reading file /sys/devices/system/cpu/cpu7/regs/identification/midr_el1
I 00:00:00.005682 executorch:main.cpp:65] Resetting threadpool with num threads = 6
I 00:00:00.015067 executorch:runner.cpp:54] Creating LLaMa runner: model_path=llama.pte, tokenizer_path=tokenizer.model
I 00:00:05.098464 executorch:runner.cpp:69] Reading metadata from model
I 00:00:05.098553 executorch:runner.cpp:121] get_n_bos: 1
I 00:00:05.098565 executorch:runner.cpp:121] get_n_eos: 1
I 00:00:05.098568 executorch:runner.cpp:121] get_max_seq_len: 128
I 00:00:05.098577 executorch:runner.cpp:121] use_kv_cache: 1
I 00:00:05.098580 executorch:runner.cpp:121] use_sdpa_with_kv_cache: 1
I 00:00:05.098582 executorch:runner.cpp:121] append_eos_to_prompt: 0
I 00:00:05.158840 executorch:runner.cpp:121] get_vocab_size: 128256
I 00:00:05.158867 executorch:runner.cpp:121] get_bos_id: 128000
I 00:00:05.158876 executorch:runner.cpp:121] get_eos_id: 128001
"Once Upon a Time in Hollywood" is a 2019 American comedy-drama film written and directed by Quentin Tarantino. The film takes place in 1969 Los Angeles and follows the story of Rick Dalton, a faded television actor, and his stunt double and close friend, Cliff Booth, as they navigate the changing landscape of Hollywood and the counterculture movement.
The film features an all-star cast, including Leonardo DiCaprio as Rick Dalton, Brad Pitt as Cliff Booth, Margot Robbie as Sharon Tate, Emile Hirsch as Jay Sebring, and Al Pacino as
PyTorchObserver {"prompt_tokens":3,"generated_tokens":116,"model_load_start_ms":1719242523653,"model_load_end_ms":1719242528797,"inference_start_ms":1719242528797,"inference_end_ms":1719242540236,"prompt_eval_end_ms":1719242529160,"first_token_ms":1719242529265,"aggregate_sampling_time_ms":118,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:16.598320 executorch:runner.cpp:402]    Prompt Tokens: 3    Generated Tokens: 116
I 00:00:16.598324 executorch:runner.cpp:408]    Model Load Time:        5.144000 (seconds)
I 00:00:16.598338 executorch:runner.cpp:418]    Total inference time:       11.439000 (seconds)      Rate:  10.140747 (tokens/second)
I 00:00:16.598341 executorch:runner.cpp:426]        Prompt evaluation:  0.363000 (seconds)       Rate:  8.264463 (tokens/second)
I 00:00:16.598343 executorch:runner.cpp:437]        Generated 116 tokens:   11.076000 (seconds)      Rate:  10.473095 (tokens/second)
I 00:00:16.598345 executorch:runner.cpp:445]    Time to first generated token:  0.468000 (seconds)
I 00:00:16.598347 executorch:runner.cpp:452]    Sampling time over 119 tokens:  0.118000 (seconds)
cccclai commented 4 months ago

Thanks for checking it out. Error code definition can be found here. Sounds like there are some issues on Android side.

glbrighenti commented 4 months ago

I got the problem fixed, by not using the pre-built AAR library (downloaded using _download_prebuiltlib.sh ) , but instead I compiled the native code with ./gradlew :app:setup.

I am not sure what incompatible with the pre-built library, maybe its because my host machine is Linux, and setup mentions that Mac is preferred.