mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.26k stars 1.58k forks source link

[Bug] Bug with DeepSeek V2 #2996

Open 0xLienid opened 1 month ago

0xLienid commented 1 month ago

🐛 Bug

When loading DeepSeek V2 Lite, weights convert and compile fine, but the first attempt to do inference always fails with this. My suspicion is that there's an issue with the Relax implementation of the model somewhere.

image

To Reproduce

Steps to reproduce the behavior:

  1. Download DeepSeek-V2-Lite from HuggingFace
  2. Use MLC CLI to convert weights, generate config, and compile
  3. Try doing inference either with mlc_llm chat, MLCEngine, or other

Expected behavior

I load the model and it gives me tokens back

Environment

dylanlanigansmith commented 1 month ago

I am also experiencing this issue with a similar CUDA 12 setup, my comment here has more info. Any insight would be greatly appreciated!