microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
514 stars 127 forks source link

Application crashes on exit with v0.5.0 #1056

Open f2bo opened 1 week ago

f2bo commented 1 week ago

The latest release (v0.5.0) introduces additional checks designed to prevent resource leaks that will cause an application to crash unless some explicit changes are made. This can be easily reproduced with the sample code included in the README file of the Microsoft.ML.OnnxRuntimeGenAI package.

Executing this code as is will result in a crash when the application exits after displaying the following message:

Error: Shutdown must be called before process exit, please check the documentation for the proper API to call to ensure clean shutdown.

Proper disposal of the OgaHandle instance fixes the issue

Replace

OgaHandle ogaHandle = new OgaHandle();

With this

using OgaHandle ogaHandle = new OgaHandle();

I couldn't find documentation regarding the OgaHandle, nor many examples that use it, considering how essential it seems to be. This impacts applications using the ONNX Connector in Semantic Kernel, which currently is not aware of this new requirement (see https://github.com/microsoft/semantic-kernel/issues/9628).

While making it easier to diagnose resource leaks is always welcome, crashing the application seems a bit heavy handed. Maybe just keep the error messages but remove the forced shutdown?

skyline75489 commented 1 week ago

I remember having a discussion with @RyanUnderhill about whether we should make it std::abort. Maybe we can move the discussion here instead of Teams?

skyline75489 commented 1 week ago

In #799 the proper releasing of resources became a hard requirement. This especially impacts C# and Java scenarios where users will need to manually release the resources (using in C#, and try in Java). We should probably at least add some doc about it.

sjpritchard commented 5 days ago

Same occurs on C++. I can't find any reference in the C++ examples.

Image

skyline75489 commented 3 days ago

@sjpritchard Hi! Could you please post the C++ code you're using? The Adapters API are quite new and I don't think it should trigger leaked source check.

sjpritchard commented 3 days ago

@skyline75489 Here it is:

#pragma once

#include "ort_genai.h"

#include <QDebug>
#include <QObject>
#include <QString>
#include <QStringList>

class RagLanguageModel : public QObject {
    Q_OBJECT
public:
    explicit RagLanguageModel(QObject* parent = nullptr);
    void Generate(QString query, QStringList context);

public slots:

signals:
    void Generated(QString text);

private:
    std::unique_ptr<OgaModel>           model_{nullptr};
    std::unique_ptr<OgaTokenizer>       tokenizer_{nullptr};
};
#include "RagLanguageModel.h"

RagLanguageModel::RagLanguageModel(QObject* parent) : QObject{parent} {
    model_ = OgaModel::Create("phi-3.5-mini-instruct"); 
    tokenizer_        = OgaTokenizer::Create(*model_);
}

void RagLanguageModel::Generate(QString query, QStringList context) {
    QString joined_context = context.join(" ");

    auto prompt = QString("<|system|>\n"
                          "You are a helpful question answering bot. "
                          "<|end|>\n"
                          "<|user|>\n"
                          "Context: %1\n"
                          "Question: %2."
                          "<|end|>\n"
                          "<|assistant|>")
                      .arg(joined_context)
                      .arg(query);

    auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer_);
    auto sequences        = OgaSequences::Create();
    tokenizer_->Encode(prompt.toUtf8().constData(), *sequences);

    auto params = OgaGeneratorParams::Create(*model_);
    params->SetInputSequences(*sequences);
    params->SetSearchOption("max_length", 1024);

    try {
        auto        generator = OgaGenerator::Create(*model_, *params);
        while (!generator->IsDone()) {
            generator->ComputeLogits();
            generator->GenerateNextToken();
            size_t         sequence_index  = 0;
            size_t         sequence_length = generator->GetSequenceCount(sequence_index);
            const int32_t* sequence_data   = generator->GetSequenceData(sequence_index);
            int32_t        latest_token    = sequence_data[sequence_length - 1];
            const char*    decoded_chunk   = tokenizer_stream->Decode(latest_token);
            auto           text            = std::string(decoded_chunk);
            emit Generated(QString::fromStdString(text));
        }
    } catch (const std::exception& e) {
        qDebug() << "Error:" << e.what();
    }
}
luomaojiang2016 commented 3 days ago

I encountered the same issue, the onnxruntime-genai.dll crashes when the program closes. It works fine in version 0.4, has problems in version 0.5. I also did an experiment, and found that the application crashes on closing as long as it calls the OgaModelCreate function. This is likely a bug in onnxruntime-genai.dll, and I hope it can be fixed as soon as possible. Thank you