[Bug] Bug with DeepSeek V2

🐛 Bug

When loading DeepSeek V2 Lite, weights convert and compile fine, but the first attempt to do inference always fails with this. My suspicion is that there's an issue with the Relax implementation of the model somewhere.

To Reproduce

Steps to reproduce the behavior:

Download DeepSeek-V2-Lite from HuggingFace
Use MLC CLI to convert weights, generate config, and compile
Try doing inference either with mlc_llm chat, MLCEngine, or other

Expected behavior

I load the model and it gives me tokens back

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): Debian and Ubuntu
How you installed MLC-LLM (conda, source): Tried both pip and source
How you installed TVM-Unity (pip, source): Tried both pip and source
Python version (e.g. 3.10): 3.10
CUDA/cuDNN version (if applicable): Tried 12.4 and 12.6

mlc-ai / mlc-llm

[Bug] Bug with DeepSeek V2 #2996

🐛 Bug

To Reproduce

Expected behavior

Environment