mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.19k stars 1.58k forks source link

Process is going to kill itself! #2890

Closed Vinaysukhesh98 closed 1 month ago

Vinaysukhesh98 commented 2 months ago

ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. 2024-09-10 23:46:05.024 32001-32058 AndroidRuntime ai.mlc.mlcchat E FATAL EXCEPTION: Thread-8 Process: ai.mlc.mlcchat, PID: 32001 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. Stack trace: File "latest_mlc/mlc-llm/android/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc", line 2650

                                                                                                    at org.apache.tvm.Base.checkCall(Base.java:173)
                                                                                                    at org.apache.tvm.Function.invoke(Function.java:130)
                                                                                                    at ai.mlc.mlcllm.JSONFFIEngine.runBackgroundLoop(JSONFFIEngine.java:64)
                                                                                                    at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:42)
                                                                                                    at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:40)
                                                                                                    at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:19)
                                                                                                    at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:18)
                                                                                                    at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
SwordFishKe commented 2 months ago

@Vinaysukhesh98 how fix it ??? i get the same issue

Kaneki-x commented 2 months ago

@SwordFishKe @Vinaysukhesh98 You can add a check for args.size() == 21, and set the args[21] parameter to nullptr within the method to temporarily bypass this issue. Preliminary analysis suggests that the additional parameter in this method modified the condition for checking the number of parameters, but the places that call this method externally did not add the new parameter, resulting in a mismatch between the old and new number of parameters. It seems that the developers of TVM do not pay attention to or test the actual performance of the MLC-LLM client, as crashes often occur after changes are made.

Kaneki-x commented 2 months ago

@vinx13 suspect that the crash issue was caused by your 8059c770dc563411717a44d9409888be3f85b7ee commit change on September 4th. Can you help take a look and fix it?

vinx13 commented 2 months ago

Did you checkout the updated the TVM submodule? You also need to re compile the model

Kaneki-x commented 2 months ago

yes, the tvm submodule is update to e0ef1c92add4048823a5e2c8724495418865986b, and clean up all build cache. then try mlc_llm package in MLCChat folder, after compile and install still get same crash stack. Please check if there are any missing operations.

vinx13 commented 2 months ago

@MasterJH5574 seems the submodule already contained the fix for missing func for TIR kv cache. Anything missing?

MasterJH5574 commented 2 months ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

Vinaysukhesh98 commented 2 months ago

Hi @MasterJH5574 , I tried rebuilding and did below changes in android code to come here.

Seems the error is due to the gradle version and android version mismatch i gave a temp fix by updating the gradle to 8.7 and adding the android:enableOnBackInvokedCallback="true" to the android xml file but i still sees the different behaviour where attaching log below model is phi-3 logcat: type=1400 audit(0.0:35370): avc: denied { getattr } for path="/sys/module/metis/parameters/minor_window_app" dev="sysfs" ino=70200 scontext=u:r:untrusted_app_32:s0:c92,c257,c512,c768 tcontext=u:object_r:sysfs_migt:s0 tclass=file permissive=0 app=ai.mlc.mlcchat 2024-09-13 23:48:23.465 32080-32080 FrameTracker ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::0@0@ai.mlc.mlcchat 2024-09-13 23:48:33.138 32080-32134 ai.mlc.mlcchat ai.mlc.mlcchat I This is non sticky GC, maxfree is 33554432 minfree is 8388608 2024-09-13 23:48:33.147 32080-32137 System ai.mlc.mlcchat W A resource failed to call release. 2024-09-13 23:48:47.991 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:47.992 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:47.995 32080-32080 HandWritingStubImpl ai.mlc.mlcchat I getCurrentKeyboardType: 1 2024-09-13 23:48:48.106 32080-32080 Compatibil...geReporter ai.mlc.mlcchat D Compat change id reported: 210923482; UID 10348; state: ENABLED 2024-09-13 23:48:48.108 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:48:48.163 32080-32080 RemoteInpu...ectionImpl ai.mlc.mlcchat W getTextBeforeCursor on inactive InputConnection 2024-09-13 23:48:48.164 32080-32080 WindowOnBackDispatcher ai.mlc.mlcchat W sendCancelIfRunning: isInProgress=falsecallback=ImeCallback=ImeOnBackInvokedCallback@170417451 Callback=android.window.IOnBackInvokedCallback$Stub$Proxy@5eb0386 2024-09-13 23:48:48.168 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onCancelled at PHASE_CLIENT_APPLY_ANIMATION 2024-09-13 23:48:48.596 32080-32140 ai.mlc.mlcchat ai.mlc.mlcchat W PerfMonitor async binderTransact : time=296ms interface=android.gui.ITransactionComposerListener code=1 2024-09-13 23:48:48.598 32080-32080 Looper ai.mlc.mlcchat W PerfMonitor doFrame : time=302ms vsyncFrame=0 latency=1ms procState=-1 historyMsgCount=4 2024-09-13 23:48:48.619 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:bf5dc4ed: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:48.620 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onHidden 2024-09-13 23:48:48.662 32080-32250 RenderInspector ai.mlc.mlcchat W QueueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=295 ms, max=295 ms. 2024-09-13 23:48:48.715 32080-32250 RenderInspector ai.mlc.mlcchat W DequeueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=28 ms, max=28 ms. 2024-09-13 23:48:58.872 32080-32080 FrameTracker ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::1@0@ai.mlc.mlcchat 2024-09-13 23:49:24.395 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:49:24.782 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:49:24.837 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant).

where it is giving very poor prefill tokens that need to be fixed in the code it seems

Kaneki-x commented 2 months ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

@MasterJH5574 I cleared the build cache, then executed the new command you provided in the MLCChat directory, and recompiled and ran it, but the same error still occurred. I would like to ask where the problem lies.

yoghur commented 2 months ago

I met the same problem when i adding new quant method and compile it QQ_1726295934771 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.

but the app worked well when using the model quantized by q4f16_1 QQ_1726296089311

Does this error have anything to do with the new quantization method I added?

Vinaysukhesh98 commented 1 month ago

I met the same problem when i adding new quant method and compile it QQ_1726295934771 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.

but the app worked well when using the model quantized by q4f16_1 QQ_1726296089311

Does this error have anything to do with the new quantization method I added?

even the app working well did you observe the prefill tok/sec is too worse in this.

MrRace commented 1 month ago

Same problem!when use Qwen2!But for Qwen1.5, it can work!

yoghur commented 1 month ago

I met the same problem when i adding new quant method and compile it QQ_1726295934771 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. but the app worked well when using the model quantized by q4f16_1 QQ_1726296089311 Does this error have anything to do with the new quantization method I added?

even the app working well did you observe the prefill tok/sec is too worse in this.

only 1.3tokens/s

Vinaysukhesh98 commented 1 month ago

Hi @MasterJH5574 ,

Please help me understand the reason why prefill is slow in inferencing?

vinx13 commented 1 month ago

the performance issue might be caused by https://github.com/apache/tvm/pull/17326, it is not expected to change the original prefill behavior though

Vinaysukhesh98 commented 1 month ago

the performance issue might be caused by apache/tvm#17326, it is not expected to change the original prefill behavior though

Trying old version of packages, fix this issue?

ousecTic commented 1 month ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

This worked for me thanks :)