Process is going to kill itself!

Vinaysukhesh98 commented 2 months ago

ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. 2024-09-10 23:46:05.024 32001-32058 AndroidRuntime ai.mlc.mlcchat E FATAL EXCEPTION: Thread-8 Process: ai.mlc.mlcchat, PID: 32001 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. Stack trace: File "latest_mlc/mlc-llm/android/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc", line 2650

                                                                                                    at org.apache.tvm.Base.checkCall(Base.java:173)
                                                                                                    at org.apache.tvm.Function.invoke(Function.java:130)
                                                                                                    at ai.mlc.mlcllm.JSONFFIEngine.runBackgroundLoop(JSONFFIEngine.java:64)
                                                                                                    at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:42)
                                                                                                    at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:40)
                                                                                                    at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:19)
                                                                                                    at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:18)
                                                                                                    at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

SwordFishKe commented 2 months ago

@Vinaysukhesh98 how fix it ??? i get the same issue

Kaneki-x commented 2 months ago

@SwordFishKe @Vinaysukhesh98 You can add a check for args.size() == 21, and set the args[21] parameter to nullptr within the method to temporarily bypass this issue. Preliminary analysis suggests that the additional parameter in this method modified the condition for checking the number of parameters, but the places that call this method externally did not add the new parameter, resulting in a mismatch between the old and new number of parameters. It seems that the developers of TVM do not pay attention to or test the actual performance of the MLC-LLM client, as crashes often occur after changes are made.

Kaneki-x commented 2 months ago

@vinx13 suspect that the crash issue was caused by your 8059c770dc563411717a44d9409888be3f85b7ee commit change on September 4th. Can you help take a look and fix it?

vinx13 commented 2 months ago

Did you checkout the updated the TVM submodule? You also need to re compile the model

Kaneki-x commented 2 months ago

yes, the tvm submodule is update to e0ef1c92add4048823a5e2c8724495418865986b, and clean up all build cache. then try mlc_llm package in MLCChat folder, after compile and install still get same crash stack. Please check if there are any missing operations.

vinx13 commented 2 months ago

@MasterJH5574 seems the submodule already contained the fix for missing func for TIR kv cache. Anything missing?

MasterJH5574 commented 2 months ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

Vinaysukhesh98 commented 2 months ago

Hi @MasterJH5574 , I tried rebuilding and did below changes in android code to come here.

Seems the error is due to the gradle logcat: type=1400 audit(0.0:35370): avc: denied 2024-09-13 23:48:23.465 32080-32080 FrameTracker 2024-09-13 23:48:33.138 32080-32134 ai.mlc.mlcchat 2024-09-13 23:48:33.147 32080-32137 System 2024-09-13 23:48:47.991 32080-32080 ImeTracker 2024-09-13 23:48:47.992 32080-32080 ImeTracker 2024-09-13 23:48:47.995 32080-32080 HandWritingStubImpl 2024-09-13 23:48:48.106 32080-32080 Compatibil...geReporter 2024-09-13 23:48:48.108 32080-32080 ThemeUtils 2024-09-13 23:48:48.163 32080-32080 RemoteInpu...ectionImpl 2024-09-13 23:48:48.164 32080-32080 WindowOnBackDispatcher 2024-09-13 23:48:48.168 32080-32080 ImeTracker 2024-09-13 23:48:48.596 32080-32140 ai.mlc.mlcchat 2024-09-13 23:48:48.598 32080-32080 Looper 2024-09-13 23:48:48.619 32080-32080 ImeTracker 2024-09-13 23:48:48.620 32080-32080 ImeTracker 2024-09-13 23:48:48.662 32080-32250 RenderInspector 2024-09-13 23:48:48.715 32080-32250 RenderInspector 2024-09-13 23:48:58.872 32080-32080 FrameTracker 2024-09-13 23:49:24.395 32080-32080 ThemeUtils 2024-09-13 23:49:24.782 32080-32080 ThemeUtils 2024-09-13 23:49:24.837 32080-32080 ThemeUtils version and android version mismatch i gave a temp fix by updating the gradle to 8.7 and adding the android:enableOnBackInvokedCallback="true" to the android xml file but i still sees the different behaviour where attaching log below model is phi-3 { getattr } for path="/sys/module/metis/parameters/minor_window_app" dev="sysfs" ino=70200 scontext=u:r:untrusted_app_32:s0:c92,c257,c512,c768 tcontext=u:object_r:sysfs_migt:s0 tclass=file permissive=0 app=ai.mlc.mlcchat ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::0@0@ai.mlc.mlcchat ai.mlc.mlcchat I This is non sticky GC, maxfree is 33554432 minfree is 8388608 ai.mlc.mlcchat W A resource failed to call release. ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API ai.mlc.mlcchat I getCurrentKeyboardType: 1 ai.mlc.mlcchat D Compat change id reported: 210923482; UID 10348; state: ENABLED ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). ai.mlc.mlcchat W getTextBeforeCursor on inactive InputConnection ai.mlc.mlcchat W sendCancelIfRunning: isInProgress=falsecallback=ImeCallback=ImeOnBackInvokedCallback@170417451 Callback=android.window.IOnBackInvokedCallback$Stub$Proxy@5eb0386 ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onCancelled at PHASE_CLIENT_APPLY_ANIMATION ai.mlc.mlcchat W PerfMonitor async binderTransact : time=296ms interface=android.gui.ITransactionComposerListener code=1 ai.mlc.mlcchat W PerfMonitor doFrame : time=302ms vsyncFrame=0 latency=1ms procState=-1 historyMsgCount=4 ai.mlc.mlcchat I ai.mlc.mlcchat:bf5dc4ed: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onHidden ai.mlc.mlcchat W QueueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=295 ms, max=295 ms. ai.mlc.mlcchat W DequeueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=28 ms, max=28 ms. ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::1@0@ai.mlc.mlcchat ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant).

where it is giving very poor prefill tokens that need to be fixed in the code it seems

Kaneki-x commented 2 months ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

@MasterJH5574 I cleared the build cache, then executed the new command you provided in the MLCChat directory, and recompiled and ran it, but the same error still occurred. I would like to ask where the problem lies.

yoghur commented 2 months ago

I met the same problem when i adding new quant method and compile it QQ_1726295934771 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.

but the app worked well when using the model quantized by q4f16_1 QQ_1726296089311

Does this error have anything to do with the new quantization method I added?

Vinaysukhesh98 commented 1 month ago

I met the same problem when i adding new quant method and compile it org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.

but the app worked well when using the model quantized by q4f16_1

Does this error have anything to do with the new quantization method I added?

even the app working well did you observe the prefill tok/sec is too worse in this.

MrRace commented 1 month ago

Same problem！when use Qwen2！But for Qwen1.5, it can work！

yoghur commented 1 month ago

I met the same problem when i adding new quant method and compile it org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. but the app worked well when using the model quantized by q4f16_1 Does this error have anything to do with the new quantization method I added?

even the app working well did you observe the prefill tok/sec is too worse in this.

only 1.3tokens/s

Vinaysukhesh98 commented 1 month ago

Hi @MasterJH5574 ,

Please help me understand the reason why prefill is slow in inferencing?

vinx13 commented 1 month ago

the performance issue might be caused by https://github.com/apache/tvm/pull/17326, it is not expected to change the original prefill behavior though

Vinaysukhesh98 commented 1 month ago

the performance issue might be caused by apache/tvm#17326, it is not expected to change the original prefill behavior though

Trying old version of packages, fix this issue?

ousecTic commented 1 month ago

Could you folks try run with environment variable MLC_JIT_POLICY=REDO to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package.

The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.

This worked for me thanks :)

mlc-ai / mlc-llm

Process is going to kill itself! #2890