Closed Vinaysukhesh98 closed 1 month ago
@Vinaysukhesh98 how fix it ??? i get the same issue
@SwordFishKe @Vinaysukhesh98 You can add a check for args.size() == 21, and set the args[21] parameter to nullptr within the method to temporarily bypass this issue. Preliminary analysis suggests that the additional parameter in this method modified the condition for checking the number of parameters, but the places that call this method externally did not add the new parameter, resulting in a mismatch between the old and new number of parameters. It seems that the developers of TVM do not pay attention to or test the actual performance of the MLC-LLM client, as crashes often occur after changes are made.
@vinx13 suspect that the crash issue was caused by your 8059c770dc563411717a44d9409888be3f85b7ee
commit change on September 4th. Can you help take a look and fix it?
Did you checkout the updated the TVM submodule? You also need to re compile the model
yes, the tvm submodule is update to e0ef1c92add4048823a5e2c8724495418865986b
, and clean up all build cache. then try mlc_llm package
in MLCChat folder, after compile and install still get same crash stack. Please check if there are any missing operations.
@MasterJH5574 seems the submodule already contained the fix for missing func for TIR kv cache. Anything missing?
Could you folks try run with environment variable MLC_JIT_POLICY=REDO
to force the recompilation of models? For example, MLC_JIT_POLICY=REDO python -m mlc_llm package
.
The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.
Hi @MasterJH5574 , I tried rebuilding and did below changes in android code to come here.
Seems the error is due to the gradle version and android version mismatch i gave a temp fix by updating the gradle to 8.7 and adding the android:enableOnBackInvokedCallback="true" to the android xml file but i still sees the different behaviour where attaching log below model is phi-3 logcat: type=1400 audit(0.0:35370): avc: denied { getattr } for path="/sys/module/metis/parameters/minor_window_app" dev="sysfs" ino=70200 scontext=u:r:untrusted_app_32:s0:c92,c257,c512,c768 tcontext=u:object_r:sysfs_migt:s0 tclass=file permissive=0 app=ai.mlc.mlcchat 2024-09-13 23:48:23.465 32080-32080 FrameTracker ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::0@0@ai.mlc.mlcchat 2024-09-13 23:48:33.138 32080-32134 ai.mlc.mlcchat ai.mlc.mlcchat I This is non sticky GC, maxfree is 33554432 minfree is 8388608 2024-09-13 23:48:33.147 32080-32137 System ai.mlc.mlcchat W A resource failed to call release. 2024-09-13 23:48:47.991 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:47.992 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:47.995 32080-32080 HandWritingStubImpl ai.mlc.mlcchat I getCurrentKeyboardType: 1 2024-09-13 23:48:48.106 32080-32080 Compatibil...geReporter ai.mlc.mlcchat D Compat change id reported: 210923482; UID 10348; state: ENABLED 2024-09-13 23:48:48.108 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:48:48.163 32080-32080 RemoteInpu...ectionImpl ai.mlc.mlcchat W getTextBeforeCursor on inactive InputConnection 2024-09-13 23:48:48.164 32080-32080 WindowOnBackDispatcher ai.mlc.mlcchat W sendCancelIfRunning: isInProgress=falsecallback=ImeCallback=ImeOnBackInvokedCallback@170417451 Callback=android.window.IOnBackInvokedCallback$Stub$Proxy@5eb0386 2024-09-13 23:48:48.168 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:ede276ba: onCancelled at PHASE_CLIENT_APPLY_ANIMATION 2024-09-13 23:48:48.596 32080-32140 ai.mlc.mlcchat ai.mlc.mlcchat W PerfMonitor async binderTransact : time=296ms interface=android.gui.ITransactionComposerListener code=1 2024-09-13 23:48:48.598 32080-32080 Looper ai.mlc.mlcchat W PerfMonitor doFrame : time=302ms vsyncFrame=0 latency=1ms procState=-1 historyMsgCount=4 2024-09-13 23:48:48.619 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:bf5dc4ed: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-13 23:48:48.620 32080-32080 ImeTracker ai.mlc.mlcchat I ai.mlc.mlcchat:65911ff: onHidden 2024-09-13 23:48:48.662 32080-32250 RenderInspector ai.mlc.mlcchat W QueueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=295 ms, max=295 ms. 2024-09-13 23:48:48.715 32080-32250 RenderInspector ai.mlc.mlcchat W DequeueBuffer time out on ai.mlc.mlcchat/ai.mlc.mlcchat.MainActivity, count=1, avg=28 ms, max=28 ms. 2024-09-13 23:48:58.872 32080-32080 FrameTracker ai.mlc.mlcchat E force finish cuj, time out: JIME_INSETS_ANIMATION::1@0@ai.mlc.mlcchat 2024-09-13 23:49:24.395 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:49:24.782 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant). 2024-09-13 23:49:24.837 32080-32080 ThemeUtils ai.mlc.mlcchat E View class dev.jeziellago.compose.markdowntext.CustomTextView is an AppCompat widget that can only be used with a Theme.AppCompat theme (or descendant).
where it is giving very poor prefill tokens that need to be fixed in the code it seems
Could you folks try run with environment variable
MLC_JIT_POLICY=REDO
to force the recompilation of models? For example,MLC_JIT_POLICY=REDO python -m mlc_llm package
.The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.
@MasterJH5574 I cleared the build cache, then executed the new command you provided in the MLCChat directory, and recompiled and ran it, but the same error still occurred. I would like to ask where the problem lies.
I met the same problem when i adding new quant method and compile it org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.
but the app worked well when using the model quantized by q4f16_1
Does this error have anything to do with the new quantization method I added?
I met the same problem when i adding new quant method and compile it org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.
but the app worked well when using the model quantized by q4f16_1
Does this error have anything to do with the new quantization method I added?
even the app working well did you observe the prefill tok/sec is too worse in this.
Same problem!when use Qwen2!But for Qwen1.5, it can work!
I met the same problem when i adding new quant method and compile it org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. but the app worked well when using the model quantized by q4f16_1 Does this error have anything to do with the new quantization method I added?
even the app working well did you observe the prefill tok/sec is too worse in this.
only 1.3tokens/s
Hi @MasterJH5574 ,
Please help me understand the reason why prefill is slow in inferencing?
the performance issue might be caused by https://github.com/apache/tvm/pull/17326, it is not expected to change the original prefill behavior though
the performance issue might be caused by apache/tvm#17326, it is not expected to change the original prefill behavior though
Trying old version of packages, fix this issue?
Could you folks try run with environment variable
MLC_JIT_POLICY=REDO
to force the recompilation of models? For example,MLC_JIT_POLICY=REDO python -m mlc_llm package
.The submodule update won't automatically trigger recompilation unfortunately, and we will try our test to make things stable.
This worked for me thanks :)
ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. 2024-09-10 23:46:05.024 32001-32058 AndroidRuntime ai.mlc.mlcchat E FATAL EXCEPTION: Thread-8 Process: ai.mlc.mlcchat, PID: 32001 org.apache.tvm.Base$TVMError: TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. Stack trace: File "latest_mlc/mlc-llm/android/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc", line 2650