[Model Request] Nemotron architecture

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Apache License 2.0

19.3k stars 1.59k forks source link

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): Nemotron-Mini-4B-Instruct Nemotron-4
Is this model architecture supported by MLC-LLM? (the list of supported models) No

Additional context

This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀

mlc-ai / mlc-llm

[Model Request] Nemotron architecture #2901

⚙️ Request New Models

Additional context