.Net: HuggingFace implement also `IChatCompletionService`?

jkone27 commented 4 months ago

i am testing the hugging face preview package for .NET, but i cannot make use of IChatCompletionService

it fails to resolve that, not using openai, only huggingface apis..

#r "nuget: Microsoft.SemanticKernel"
#r "nuget: Microsoft.Extensions.Logging.Debug"
#r "nuget: Microsoft.SemanticKernel.Abstractions"
#r "nuget: Microsoft.SemanticKernel.Connectors.HuggingFace, 1.5.0-preview"
#r "nuget: Microsoft.Extensions.DependencyInjection"

open Microsoft.Extensions.Logging
//open Microsoft.SemanticKernel.Plugins.Core
open Microsoft.SemanticKernel
open Microsoft.SemanticKernel.Connectors.HuggingFace
open System
open System.ComponentModel
open Microsoft.Extensions.DependencyInjection
open Microsoft.Extensions.Logging.Abstractions
open Microsoft.SemanticKernel.ChatCompletion
open Microsoft.SemanticKernel.Connectors.OpenAI
open System.Threading.Tasks

// type Test() =
//     interface IChatCompletionService with
//         member this.GetChatMessageContentAsync(chatHistory, executionSettings, kernel, cancellationToken) =
//             task { return [ new ChatMessageContent() ] :> IReadonlyList }

let builder =
    let apikey = "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"
    let model = "openchat/openchat-3.5-0106"

    Kernel
        .CreateBuilder()
        .AddHuggingFaceTextGeneration(model = model, apiKey = apikey)
        // .AddOpenAIChatCompletion(model) i am not using openai, tried adding this, but also fails to load

type LightPlugin() =
    let mutable isOn = false

    [<KernelFunction>]
    [<Description("Gets the state of the light.")>]
    member this.GetState() = if isOn then "on" else "off"

    [<KernelFunction>]
    [<Description("Changes the state of the light.")>]
    member this.ChangeState(newState: bool) =
        isOn <- newState
        let state = this.GetState()
        printfn $"[Light is now {state}]"
        state

// load a plugin from code
builder.Services.AddLogging(fun c -> c.AddDebug().SetMinimumLevel(LogLevel.Trace) |> ignore)
builder.Plugins.AddFromType<LightPlugin>() |> ignore
let kernel = builder.Build()

// or for fsi fsi.CommandLineArgs[1] for example
printfn "prompt to summarize:"
let request = Console.ReadLine()

let testPrompt =
    $"make a 1 liner quick and short summary of max 20 words of this request: `{request}`"

// 1. test "raw" promts to verify your connection
let result = kernel.InvokePromptAsync(testPrompt).Result

printfn $"> {result}"

// 2. test SK prompts

let history = new ChatHistory()

let chatCompletionService = kernel.GetRequiredService<IChatCompletionService>()

let switchTheLight (cmd: string) = $"switch the light {cmd}"

switchTheLight "ON" |> history.AddUserMessage

// Enable auto function calling
let openAIPromptExecutionSettings = new OpenAIPromptExecutionSettings()
openAIPromptExecutionSettings.ToolCallBehavior <- ToolCallBehavior.AutoInvokeKernelFunctions

// Get the response from the AI
let result2 =
    chatCompletionService
        .GetChatMessageContentAsync(history, openAIPromptExecutionSettings, kernel)
        .Result

printfn $">> {result2.Content}"
history.AddMessage(result2.Role, result2.Content) |> ignore

Krzysztof318 commented 4 months ago

I suggest you implementing your own IChatCompletionService for specific model. You can use existing HuggingFaceTextGenerationService but with special prompt.

HuggingFace interference api doesn't have common syntax for chatting, so a chat payload can vary on model you use.

jkone27 commented 4 months ago

Can you provide a quick sample on how to implement a very basic one ? I was also thinking the same

Krzysztof318 commented 4 months ago

Look at this sample https://github.com/microsoft/semantic-kernel/blob/main/dotnet%2Fsamples%2FKernelSyntaxExamples%2FExample16_CustomLLM.cs

chat completion service is similar but takes ChatHistory param instead of string prompt.

You have to create prompt something like that from chat history

User: hello
Assistant: hi what can I do?
User:  help me with...
Assitant: <here empty so model can complete this>

But keep in mind this pattern can not work with each model.

markwallace-microsoft commented 4 months ago

@jkone27 if you do want to make a contribution to the Semantic Kernel then @RogerBarreto can help by setting up a feature branch and getting your changes reviewed.

Krzysztof318 commented 3 months ago

My oversight, I see now that huggingFace support chatting via interference api https://huggingface.co/docs/api-inference/detailed_parameters?code=python#conversational-task

JonathanVelkeneers commented 3 months ago

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

https://huggingface.co/docs/text-generation-inference/messages_api#hugging-face-inference-endpoints https://huggingface.co/blog/tgi-messages-api

jkone27 commented 3 months ago

@markwallace-microsoft @Krzysztof318 ⬆️

Krzysztof318 commented 3 months ago

@jkone27 hi, unfortunately I don't have time for implementing that now. But why don't you want to contribute and create PR fot that? Implementing this should be quite simple.

RogerBarreto commented 3 months ago

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

It is, although not supported by the public API this seems to be valid for the TGI deployments.

Hugging Face - Chat Completion POC.

JonathanVelkeneers commented 3 months ago

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

It is, although not supported by the public API this seems to be valid for the TGI deployments.

Hugging Face - Chat Completion POC.

@RogerBarreto It is supported by the public api, albeit poorly documented. Here is an example of how a HuggingFace model can be used with an existing openAI client library, and thus can be used in chat mode.

This in turn can be translated to a cUrl request.

A requirement for this to work with a specific model is that it has to have the chat_template property in their tokenizer_config.json file example. So not all models will work OOTB with their public interference API, but most popular ones have this implemented.

Making a cUrl request to this model on the public API works in openAI chat style:

    curl https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions \
    -X POST \
    -d '{"model":"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO", "messages": [{"role":"user","content":"How is the weather in Antwerp, Belgium?"}], "parameters": {"temperature": 0.7, "max_new_tokens": 100}}' \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer XXXXXXXXXXXXXXX"

This yields the following result:

{"id":"","object":"text_completion","created":1711714682,"model":"text-generation-inference/Nous-Hermes-2-Mixtral-8x7B-DPO-medusa","system_fingerprint":"1.4.3-sha-e6bb3ff","choices":[{"index":0,"message":{"role":"assistant","content":"As weather data changes constantly, the most accurate and up-to-date weather information for Antwerp, Belgium, can be found through weather websites or apps. These sources provide real-time and forecasted weather updates, including temperature, wind speed, humidity, and chance of precipitation. To get the current data, visit websites like Weather.com or AccuWeather.com and search for 'Antwerp, Belgium'. Or, you can use a weather app like AccuWe"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":20,"completion_tokens":100,"total_tokens":120}}%

Streaming is also supported when stream:true is present in the request body.

jkone27 commented 3 months ago

probably all models supported in huggingchat should support this also? just a thought

jkone27 commented 2 months ago

thank you @RogerBarreto

jkone27 commented 2 months ago

is there an example for this somewhere in tests or doc with any huggingface chat api?

JonathanVelkeneers commented 2 months ago

@jkone27

There is an example here. In theory it's just replacing the localhost url with a hugginface interference api url. Example: Either https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions or https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO should work.

However these changes have not been released in a new version. (last release 2 weeks ago)

microsoft / semantic-kernel

.Net: HuggingFace implement also `IChatCompletionService`? #5403

Hugging Face - Chat Completion POC.

Hugging Face - Chat Completion POC.