Closed jkone27 closed 2 months ago
I suggest you implementing your own IChatCompletionService for specific model. You can use existing HuggingFaceTextGenerationService but with special prompt.
HuggingFace interference api doesn't have common syntax for chatting, so a chat payload can vary on model you use.
Can you provide a quick sample on how to implement a very basic one ? I was also thinking the same
Look at this sample https://github.com/microsoft/semantic-kernel/blob/main/dotnet%2Fsamples%2FKernelSyntaxExamples%2FExample16_CustomLLM.cs
chat completion service is similar but takes ChatHistory param instead of string prompt.
You have to create prompt something like that from chat history
User: hello
Assistant: hi what can I do?
User: help me with...
Assitant: <here empty so model can complete this>
But keep in mind this pattern can not work with each model.
@jkone27 if you do want to make a contribution to the Semantic Kernel then @RogerBarreto can help by setting up a feature branch and getting your changes reviewed.
My oversight, I see now that huggingFace support chatting via interference api https://huggingface.co/docs/api-inference/detailed_parameters?code=python#conversational-task
I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?
https://huggingface.co/docs/text-generation-inference/messages_api#hugging-face-inference-endpoints https://huggingface.co/blog/tgi-messages-api
@markwallace-microsoft @Krzysztof318 ⬆️
@jkone27 hi, unfortunately I don't have time for implementing that now. But why don't you want to contribute and create PR fot that? Implementing this should be quite simple.
I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?
It is, although not supported by the public API this seems to be valid for the TGI deployments.
I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?
It is, although not supported by the public API this seems to be valid for the TGI deployments.
Hugging Face - Chat Completion POC.
@RogerBarreto It is supported by the public api, albeit poorly documented. Here is an example of how a HuggingFace model can be used with an existing openAI client library, and thus can be used in chat mode.
This in turn can be translated to a cUrl request.
A requirement for this to work with a specific model is that it has to have the chat_template
property in their tokenizer_config.json
file example. So not all models will work OOTB with their public interference API, but most popular ones have this implemented.
Making a cUrl request to this model on the public API works in openAI chat style:
curl https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions \
-X POST \
-d '{"model":"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO", "messages": [{"role":"user","content":"How is the weather in Antwerp, Belgium?"}], "parameters": {"temperature": 0.7, "max_new_tokens": 100}}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer XXXXXXXXXXXXXXX"
This yields the following result:
{"id":"","object":"text_completion","created":1711714682,"model":"text-generation-inference/Nous-Hermes-2-Mixtral-8x7B-DPO-medusa","system_fingerprint":"1.4.3-sha-e6bb3ff","choices":[{"index":0,"message":{"role":"assistant","content":"As weather data changes constantly, the most accurate and up-to-date weather information for Antwerp, Belgium, can be found through weather websites or apps. These sources provide real-time and forecasted weather updates, including temperature, wind speed, humidity, and chance of precipitation. To get the current data, visit websites like Weather.com or AccuWeather.com and search for 'Antwerp, Belgium'. Or, you can use a weather app like AccuWe"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":20,"completion_tokens":100,"total_tokens":120}}%
Streaming is also supported when stream:true is present in the request body.
probably all models supported in huggingchat should support this also? just a thought
thank you @RogerBarreto
is there an example for this somewhere in tests or doc with any huggingface chat api?
@jkone27
There is an example here. In theory it's just replacing the localhost url with a hugginface interference api url. Example: Either https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions or https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO should work.
However these changes have not been released in a new version. (last release 2 weeks ago)
i am testing the hugging face preview package for .NET, but i cannot make use of
IChatCompletionService
it fails to resolve that, not using openai, only huggingface apis..