mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.66k stars 1.52k forks source link

[Bug] Loading Model and cold start prompting freezes application #1379

Closed tobrun closed 9 months ago

tobrun commented 9 months ago

🐛 Bug

I'm noticing this both with using the default LLama-2-7b and TinyLlama-1.1b. When loading the model for the first time or when prompting the model for the first time. It halts the process and the UI freezes. It takes a couple seconds before the UI thread becomes responsive again.

If you have multiple workers in your application outside of the model loading, it will result in the Android OS killing the application with an ANR (Android Not Responding) message. These type of ANR happen when you try performing too much work on the main thread so I'm assuming we are missing some asynchronous loading of the model weights into memory or when we try prompting the model.

To Reproduce

Steps to reproduce the behavior:

  1. try loading llama-2 based model on a device

Expected behavior

The model weights are loading asynchronously into memory and interacting with the model is also being executed on a background thread.

Environment

tqchen commented 9 months ago

This is interesting. I think one take away is that we should place the initialization of the chat module (aka chat.reload) also in the separate thread(just like the thread we use for decode).

Note that in iOS this is achieved with a threadworker, would be great to check what is the case in android

tqchen commented 9 months ago

See https://github.com/mlc-ai/mlc-llm/blob/main/ios/MLCChat/States/ChatState.swift#L325 how we wrap things in a threadWorker.push

tobrun commented 9 months ago

I'm planning to debug this more while looking into https://github.com/mlc-ai/mlc-llm/issues/1295, the issue is that atm I'm unable to debug because there is no native build system integration with gradle which makes it unable to debug c++ code directly from Android Studio.

tobrun commented 9 months ago

The initial loading deosnt' seem to be an issue anymore.. but the prompting with large sizes is. Following up on in #1401. Closing this one.