Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
I found a bug is that the gpu memory shows the xinf and the llm model are running,but i can't see the model in web ui.Also,I can't use the llm model.
To Reproduce
At begining,I run qwen1.5-14b-chat in a linux with xinf,llvm.And i do a test,I want to do a stress test to test some indicators.At first time,it seems ok.But when i run again,I found the bug.
Also,the customer model registered by user is also lost
Describe the bug
I found a bug is that the gpu memory shows the xinf and the llm model are running,but i can't see the model in web ui.Also,I can't use the llm model.
To Reproduce
At begining,I run qwen1.5-14b-chat in a linux with xinf,llvm.And i do a test,I want to do a stress test to test some indicators.At first time,it seems ok.But when i run again,I found the bug. Also,the customer model registered by user is also lost
Some guesses
The gpu memory usage exceeds and cause the error?