Open nazneenn opened 7 months ago
Hi @nazneenn , we are developing a poc of FastAPI serving using multi-GPU, will keep you updated.
Watching this one - I'll be aiming to run Mixtral 8x7b AWQ on a pair of Arc A770s (I'll be buying the second GPU as soon as I know it's supported).
Hi @nazneenn @digitalscream FastAPI serving using multi-GPU is now supported in ipex-llm, please refer to this example https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Deepspeed-AutoTP-FastAPI
Hi, Could you please help provide guide on integrating deepspeed approach of using multi-GPU Intel Flex 140 to run model inference using FastAPI and uvicorn setting ? model id: 'meta-llama/Llama-2-7b-chat-hf' Thanks