Closed anubhavrana closed 3 weeks ago
Hi @anubhavrana, so if I understand correctly, you would like to provide a custom model_id
specifically for the chat_completion
method to override the dummy value sent to the server? To do that, I don't think we should use the model
parameter from both InferenceClient
and chat_completion
to provide the url+the model id, as it would be misleading. I would only add an optional model_id: Optional[str] = None
parameter to chat_completion
and use it.
However, before moving forward on this I'd like to understand what's your issue with using headers
to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can do InferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...)
. Is there a problem with this solution? I found it unusual to rely the body payload to route requests.
Hi @anubhavrana, so if I understand correctly, you would like to provide a custom
model_id
specifically for thechat_completion
method to override the dummy value sent to the server? To do that, I don't think we should use themodel
parameter from bothInferenceClient
andchat_completion
to provide the url+the model id, as it would be misleading. I would only add an optionalmodel_id: Optional[str] = None
parameter tochat_completion
and use it.
Yes that's exactly it, and that makes sense and agreed that it shouldn't be added to the InferenceClient
.
However, before moving forward on this I'd like to understand what's your issue with using
headers
to perform your goal using the current implementation? Your router/gateway should be able to read the request headers to route the request. For example you can doInferenceClient(model="TGI URL", headers={"X-Model-Id": <your_model_id>}).chat_completion(...)
. Is there a problem with this solution? I found it unusual to rely the body payload to route requests.
Yes our gateway reads the request headers and uses that to route it's requests. What we have done with the gateway is to separate out the tgi
and openai
routes. For the tgi
routes doing what you proposed is how we make requests to the gateway and it works well. When the gateway receives requests for the openai
routes, a custom plugin retrieves the model
parameter from the request body and adds that to the request as a header in the same way as the tgi
routes.
Our users use the openai
routes with the openai
client, where when the chat.completions.create
method is called the model
argument is provided and then used by the gateway plugin to add the headers. I would like to be able to do the same thing with the InferenceClient
using the openai
routes to provide that flexibility as well.
I hope that helped clarify the problem. Open to other solutions too.
Ok, makes sense. I opened https://github.com/huggingface/huggingface_hub/pull/2302 to support this :)
Thank you so much! @Wauplin
Closed by https://github.com/huggingface/huggingface_hub/pull/2302. Will be available in next release :)
Is your feature request related to a problem? Please describe.
When using the
InferenceClient
we can use themodel
argument to specify a model id on the HuggingFace Hub or a URL to a custom deployed Inference Endpoint. This generally works well but we have an API gateway sitting in front of ourtext-generation-inference
servers and the gateway has additional plugins and routing based on themodel_id
.Our plugins are currently only exposed for the
openai
v1/chat/completions
route and this issue is not present when using the nativetext-generation-inference
routes, however the change may be good to have for both.The way our plugin works is that we provide a base API URL and based on the
model_id
in thechat_completions
method we route requests to the relevant model endpoint.When trying to use the
InferenceClient
with our gateway we run into404
responses due to gateway not being able to find the routes (sincemodel
was set to"tgi"
)Describe the solution you'd like
Separation of
model
andbase_url
will help solve the issue. Furthermore, if we just usebase_url
when instantiatingInferenceClient
and usemodel
in the different methods (likechat_completions
) we can also follow a similar pattern to theopenai
client. This should also not affect or break the current usage pattern and we can have default values similarly to the current implementation.Describe alternatives you've considered
We are able to circumvent this issue when using the native
text-generation-inference
routes by specifyingheaders
inInferenceClient
which works since our plugin is not enabled on those routes, but for routes that have the plugin, adding the appropriateheaders
still fails since the plugin injects the providedmodel_id
(in the current case a random string"tgi"
) and overrides the provided header.Additional context
I'd be happy to create a PR for this. I've started working on a possible fix and would love to collaborate!