runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

Support for mistralai/Mixtral-8x7B-Instruct-v0.1 #41

Closed ilkersigirci closed 6 months ago

ilkersigirci commented 6 months ago

In the Readme it says, mistralai/Mixtral-8x7B-Instruct-v0.1 is supported but Runpod UI doesn't allow more than 1 GPU Worker for 80 GB GPU Cards. Hence, the model can't be served. Is there any other way to serve the original model without using any quantization version?

alpayariyak commented 6 months ago

Hi, to request a higher quota for severless GPUs, reach out to help@runpod.io