vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.79k stars 4.1k forks source link

[New Model]: Mistral-Nemo #6563

Closed Hambaobao closed 2 months ago

Hambaobao commented 2 months ago

The model to consider.

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

No response

jonzhep commented 2 months ago

we just need to pull head_dim from config i think https://github.com/huggingface/transformers/commit/4c040aba02b0283619a06bdc40ecf868508b9e52

simon-mo commented 2 months ago

Closed by #6548. It will be released next week. Please try build from source for now. Thank you!