mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.09k stars 1.56k forks source link

iOS app crashes when using LLama2model[Bug] #858

Closed tstanek390 closed 1 year ago

tstanek390 commented 1 year ago

🐛 Bug

iOS MLC Chat app crashes when trying to use downloaded custom Llama2 model.

To Reproduce

Steps to reproduce the behavior:

  1. Compling the model with TVM Unity and recommended procedure for iOS devices.
  2. Uploading the model to Huggingface.
  3. Downloading the model to iOS app on iPhone.
  4. Running the app with downloaded model.

Expected behavior

Using the custom model in iOS app.

Environment

link to Huggingface TVM compiled model : https://huggingface.co/tstanek390/MedLLama2iOS

Thx for any kind of help, T.

Hzfengsy commented 1 year ago

It may be due to memory limitations, i.e. there is no enough memory on your devices. Please check if RedPajama can run successfully.

tstanek390 commented 1 year ago

RedPajama runs succesfully, same with LLama-2-7B-chat-hf. I have already tried to run a dozen of models I quantized and processed myself with MLC llm and its recommended parameters and settings, with the very same or even more drastic quantizations, but the app always crashes. All the models I'm trying to use are based on Llama-2-7b. Any ideas what could cause the issues ? :(

tqchen commented 1 year ago

Likely you need to limit seq Len and use q3 for llama 2 models

baiyutang commented 10 months ago

Likely you need to limit seq Len and use q3 for llama 2 models

limit seq Len this, is there a recommanded --max-seq-len value for llama 2? 512 ?