Open dusty-nv opened 2 months ago
Nvidia's Llama-3.1-Nemotron-70B-Instruct model is strong and has released. It requires 4 40G GPUs or 2 80G GPUs. I think it is the very case that MLC-LLM will do something good. Just suggest the team to blog the progress of making Nemotron available on MLC-LLM and difficulties. The commity will learn a lot about MLC-LLM. I think it is better than just request a result.
⚙️ Request New Models
Nemotron-Mini-4B-Instruct
Nemotron-4
Additional context
This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀