Open f2bo opened 1 week ago
I remember having a discussion with @RyanUnderhill about whether we should make it std::abort
. Maybe we can move the discussion here instead of Teams?
In #799 the proper releasing of resources became a hard requirement. This especially impacts C# and Java scenarios where users will need to manually release the resources (using
in C#, and try
in Java). We should probably at least add some doc about it.
Same occurs on C++. I can't find any reference in the C++ examples.
@sjpritchard Hi! Could you please post the C++ code you're using? The Adapters API are quite new and I don't think it should trigger leaked source check.
@skyline75489 Here it is:
#pragma once
#include "ort_genai.h"
#include <QDebug>
#include <QObject>
#include <QString>
#include <QStringList>
class RagLanguageModel : public QObject {
Q_OBJECT
public:
explicit RagLanguageModel(QObject* parent = nullptr);
void Generate(QString query, QStringList context);
public slots:
signals:
void Generated(QString text);
private:
std::unique_ptr<OgaModel> model_{nullptr};
std::unique_ptr<OgaTokenizer> tokenizer_{nullptr};
};
#include "RagLanguageModel.h"
RagLanguageModel::RagLanguageModel(QObject* parent) : QObject{parent} {
model_ = OgaModel::Create("phi-3.5-mini-instruct");
tokenizer_ = OgaTokenizer::Create(*model_);
}
void RagLanguageModel::Generate(QString query, QStringList context) {
QString joined_context = context.join(" ");
auto prompt = QString("<|system|>\n"
"You are a helpful question answering bot. "
"<|end|>\n"
"<|user|>\n"
"Context: %1\n"
"Question: %2."
"<|end|>\n"
"<|assistant|>")
.arg(joined_context)
.arg(query);
auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer_);
auto sequences = OgaSequences::Create();
tokenizer_->Encode(prompt.toUtf8().constData(), *sequences);
auto params = OgaGeneratorParams::Create(*model_);
params->SetInputSequences(*sequences);
params->SetSearchOption("max_length", 1024);
try {
auto generator = OgaGenerator::Create(*model_, *params);
while (!generator->IsDone()) {
generator->ComputeLogits();
generator->GenerateNextToken();
size_t sequence_index = 0;
size_t sequence_length = generator->GetSequenceCount(sequence_index);
const int32_t* sequence_data = generator->GetSequenceData(sequence_index);
int32_t latest_token = sequence_data[sequence_length - 1];
const char* decoded_chunk = tokenizer_stream->Decode(latest_token);
auto text = std::string(decoded_chunk);
emit Generated(QString::fromStdString(text));
}
} catch (const std::exception& e) {
qDebug() << "Error:" << e.what();
}
}
I encountered the same issue, the onnxruntime-genai.dll crashes when the program closes. It works fine in version 0.4, has problems in version 0.5. I also did an experiment, and found that the application crashes on closing as long as it calls the OgaModelCreate function. This is likely a bug in onnxruntime-genai.dll, and I hope it can be fixed as soon as possible. Thank you
The latest release (v0.5.0) introduces additional checks designed to prevent resource leaks that will cause an application to crash unless some explicit changes are made. This can be easily reproduced with the sample code included in the README file of the Microsoft.ML.OnnxRuntimeGenAI package.
Executing this code as is will result in a crash when the application exits after displaying the following message:
Proper disposal of the OgaHandle instance fixes the issue
Replace
With this
I couldn't find documentation regarding the
OgaHandle
, nor many examples that use it, considering how essential it seems to be. This impacts applications using the ONNX Connector in Semantic Kernel, which currently is not aware of this new requirement (see https://github.com/microsoft/semantic-kernel/issues/9628).While making it easier to diagnose resource leaks is always welcome, crashing the application seems a bit heavy handed. Maybe just keep the error messages but remove the forced shutdown?