Support Optional Configurations for Embedding models

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

https://inference.readthedocs.io

Apache License 2.0

5k stars 397 forks source link

Support Optional Configurations for Embedding models #1661

Open zhangever opened 3 months ago

zhangever commented 3 months ago

Is your feature request related to a problem? Please describe

部署BCE embedding模型， 3副本，分布在3张3090卡上，但是没法得到比单副本3倍的提升。实质上吞吐量跟单卡一样

Describe the solution you'd like

希望能增加可选配置，例如request limit。

Describe alternatives you've considered

N/A

Additional context

N/A

codingl2k1 commented 3 months ago

你用 3 卡 3 副本时，有没有哪些进程 CPU 打满 100% 的？

zhangever commented 3 months ago

你用 3 卡 3 副本时，有没有哪些进程 CPU 打满 100% 的？

当时只有压测请求，没有别的请求呢 gpu的利用率不高，每张卡断断续续，高峰期也不到10%

codingl2k1 commented 3 months ago

我是说 CPU 使用率有没有 100% 的？

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.