microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.95k stars 3.27k forks source link

.Net: Bug: Application crashes when using the ONNX connector and OnnxRuntimeGenAI v0.5.0 package #9628

Open f2bo opened 6 days ago

f2bo commented 6 days ago

Describe the bug It seems that the latest release of the OnnxRuntimeGenAI (v0.5.0) introduces additional checks to prevent resource leaks (see https://github.com/microsoft/onnxruntime-genai/pull/799). As a result, an application using the ONNX connector crashes when it shuts down if it references this version of the ONNXGenAI runtime.

To Reproduce

Add a reference to the Microsoft.ML.OnnxRuntimeGenAI.DirectML Version="0.5.0" package and run the following code:

// using OgaHandle ogaHandle = new OgaHandle();   <<=== UNCOMMENT FOR ONNX RUNTIME LIBRARY SHUTDOWN

IKernelBuilder builder = Kernel.CreateBuilder();
builder.Services
    .AddOnnxRuntimeGenAIChatCompletion(
            modelId: "phi3.0",
            modelPath: "...PATH TO MODEL...");

IKernelBuilder kernelBuilder = builder.Services.AddKernel();

Kernel kernel = builder.Build();

IChatCompletionService chat = kernel.Services.GetRequiredService<IChatCompletionService>();
ChatHistory history = new();
history.AddUserMessage("Hi");

await foreach (var content in chat.GetStreamingChatMessageContentsAsync(history))
{
    Console.Write(content);
}

Console.WriteLine("\nDone!");

// (chat as IDisposable)?.Dispose();            <<=== UNCOMMENT TO PREVENT RESOURCE LEAKS

Running the code above produces the following output and then crashes:

Hello! How can I assist you today??
Done!
OGA Error: Shutdown must be called before process exit, please check the documentation for the proper API to call to ensure clean shutdown.

I found a workaround based on the changes found in this PR. It seems that declaring (and disposing) an OgaHandle will ensure a proper shutdown. I couldn't find much information about it except for the comment "Responsible for cleaning up the library during shutdown" here.

However, simply declaring an OgaHandle is still insufficient to ensure a clean shutdown. The ONNX connector must also be disposed before the application exits. Otherwise, it will still crash with the following output:

OGA Error: 1 instances of struct Generators::Model were leaked.
OGA Error: 1 instances of struct Generators::Tokenizer were leaked.
    Please see the documentation for the API being used to ensure proper cleanup.

Platform

f2bo commented 3 days ago

Hi @RogerBarreto.

I don't know if I'm missing something but looking at the PR that closed this issue, I don't see anything that addresses the problem I reported above. Moreover, I pulled the latest changes from the repository and tested the code again and it still crashes when it shuts down.

RogerBarreto commented 3 days ago

@f2bo , you are right,

To avoid having the problem shown in the 0.5.0 the example the caller now needs to handle the OlgaHandle class and dispose the completion service.

I'm giving here the final code how it should be to avoid the error.

using var ogaHandle = new OgaHandle();

IKernelBuilder builder = Kernel.CreateBuilder();
builder.Services
            .AddOnnxRuntimeGenAIChatCompletion(
                    modelId: "phi3.0",
                    modelPath: "... path ...");

IKernelBuilder kernelBuilder = builder.Services.AddKernel();

Kernel kernel = builder.Build();

var chat = (kernel.Services.GetRequiredService<IChatCompletionService>() as OnnxRuntimeGenAIChatCompletionService);
ChatHistory history = [];
history.AddUserMessage("Hi");

await foreach (var content in chat.GetStreamingChatMessageContentsAsync(history))
{
Console.Write(content);
}

Console.WriteLine("\nDone!");

chat.Dispose();
RogerBarreto commented 3 days ago

@f2bo , the OgaHandle as added as part of the service instance, but due to this new behavior on the library as you noticed the service needs to be disposed regardless before the application finishes.

For scenarios like this I would rather than creating a kernel to get the service, create the service directly with a using:

i.e:

using var chat = new OnnxRuntimeGenAIChatCompletionService(
    modelId: "phi3.0",
    modelPath: "... path ...");

await foreach (var content in chat.GetStreamingChatMessageContentsAsync(history))
{
    Console.Write(content);
}

Console.WriteLine("\nDone!");
f2bo commented 3 days ago

@RogerBarreto,

the OgaHandle as added as part of the service instance

I'm not sure I follow. Currently, whether I instantiate a new service directly or retrieve it from the DI container, I still need to create an OgaHandle in my code. Isn't that the case?

My concern is that this requires special handling for the ONNX connector and is somewhat brittle. I was hoping that it could be somehow encapsulated behind the code to AddOnnxRuntimeGenAIChatCompletion and that callers wouldn't need to be aware of this requirement.

While I was troubleshooting, I experimented by adding the following to the OnnxRuntimeGenAIChatCompletionService class to ensure that an OgaHandle is created and then disposed when the applications shuts down.

public sealed class OnnxRuntimeGenAIChatCompletionService : IChatCompletionService, IDisposable
{
    ...
    private readonly static OgaHandle s_ogaHandle = new();

    static OnnxRuntimeGenAIChatCompletionService()
        => AppDomain.CurrentDomain.ProcessExit += (sender, e) => s_ogaHandle.Dispose();
  ...
}

The other remaining issue is to ensure that the ONNX connector is disposed before the process ends. Where to do this will depend on how it was created. When using CreateApplicationBuilder to build the DI container, you can using IHost host = builder.Build();

However, when using Kernel kernel = builder.Build(), you need to execute (kernel.Services as IDisposable).Dispose() before shutting down and I was surprised to see that Kernel itself is not disposable. Shouldn't it be?

RogerBarret0 commented 3 days ago

After some investigation I changed the internal implementation of the service, so no more IDisposable instances will be lurking without the context of Running the inference, this resolved the problem with Oga (messages after closing the application), as well as allowed the service to run without being an IDisposable.

f2bo commented 3 days ago

I've left a comment in the PR with some concerns.