mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.98k stars 1.56k forks source link

[Model Request] Nemotron architecture #2901

Open dusty-nv opened 1 month ago

dusty-nv commented 1 month ago

⚙️ Request New Models

Additional context

This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀

wuxianliang commented 2 days ago

Nvidia's Llama-3.1-Nemotron-70B-Instruct model is strong and has released. It requires 4 40G GPUs or 2 80G GPUs. I think it is the very case that MLC-LLM will do something good. Just suggest the team to blog the progress of making Nemotron available on MLC-LLM and difficulties. The commity will learn a lot about MLC-LLM. I think it is better than just request a result.