nomic-ai / gpt4all

GPT4All: Chat with Local LLMs on Any Device
https://gpt4all.io
MIT License
67.93k stars 7.47k forks source link

C# binding - Error when executing a 2nd prediction #996

Closed mphacker closed 3 weeks ago

mphacker commented 1 year ago

System Info

Using the C# binding based off of the GPT4ALL code cloned and compiled on 6/15/2023.

Information

Related Components

Reproduction

The first prediction is completed without issues. The 2nd prediction is running into the error:

llama_eval_internal: first token must be BOS llama_eval: failed to eval LLaMA ERROR: Failed to process prompt

using Gpt4All;

var modelFactory = new Gpt4AllModelFactory();

var modelPath = "C:\\Users\\Owner\\source\\repos\\GPT4All\\Models\\ggml-v3-13b-hermes-q5_1.bin";
var systemPrompt = "You are an assistant named MyBot designed to help a person named Bob. Only respond in a professional but witty manner. Responses must be PG rated. Your responses should be short messages that are posted in chat.";

var prompt1 = "write a witty message welcoming everyone.";
var prompt2 = "write a message to start a conversation around the game Fortnite.";

using var model = modelFactory.LoadModel(modelPath);

PredictRequestOptions opts = new() { Temperature = 0.9f };

var result = await model.GetStreamingPredictionAsync(
    systemPrompt + " " + prompt1,
    opts);

await foreach (var token in result.GetPredictionStreamingAsync())
{
    Console.Write(token);
}

Console.WriteLine(" ");
result = await model.GetStreamingPredictionAsync(
    systemPrompt + " " + prompt2,
    opts);

await foreach (var token in result.GetPredictionStreamingAsync())
{
    Console.Write(token);
}

Expected behavior

A response is successfully generated for both prompts.

mvenditto commented 1 year ago

I'm going to take a look at this. But I think the problem is that with the Hermes model the underlying prompt context should be reused. Actually a new context is created for every text prediction request.

cosmic-snow commented 1 year ago

But I think the problem is that with the Hermes model the underlying prompt context should be reused.

Likely. Python bindings had to be fixed in that way: #969

codengine commented 1 year ago

Just FYI, I had the same issue when using GPT4All-13B-snoozy.ggmlv3.q4_0.bin. I also cloned the git repository, built it using mingw, compiled the C# bindings and did some tests.

llama_eval_internal: first token must be BOS llama_eval: failed to eval LLaMA ERROR: Failed to process prompt

mvenditto commented 1 year ago

result = await model.GetStreamingPredictionAsync( systemPrompt + " " + prompt2, opts);

Thanks for sharing. I'm looking fix this (and to revise and add a proper chat API next).

While a fix for this is available, a workaround (tested on hermes) is to not let the past conversation token number be reset to 0 after the first run, e.g:

result = await model.GetStreamingPredictionAsync(
    systemPrompt + " " + prompt2,
    opts with
    {
        PastConversationTokensNum = 1 // the default is zero
    });

Note that this a workaround an may not get the best result.

Update: Here's the branch where I'm working out the fix if you'd like to try it before it goes upstream csharp/issue/996-second-prompt-error

sdcondon commented 7 months ago

Hi there!

Heh, typical that I'd only find this after figuring it out for myself (though my issue was just that the convo was essentially memory-less rather than an actual exception).

However, re that branch, are you absolutely sure you want to expose that low-level type as part of this high-level API? Feels like the design of this lib so far is that all of the stuff in Bindings is encapsulated and hidden away by the higher-level stuff. You'd be breaking that with this change. IMO PredictRequestOptions does serve a purpose - it's just that it should probably be scoped to a Gpt4All (or something higher-level, anyway - don't know if you've considered something like adding something along the lines of ITextPrediction IGpt4AllModel.CreatePredictionContext(opts)) rather than an individual prediction request?

I've been playing with this for the last day or so - and have got a fork that does this and makes a fair number of other fixes and tweaks besides - see here - https://github.com/sdcondon/gpt4all/tree/csharp-fixes (NB the single commit with a very detailed commit message - though also note the base for this is the head of the csharp-gguf branch, not main). Works when built into a package, with no consumer action required to find the implementations. I've also changed the sample app to allow for an extended convo. Happy to get involved if you're interested in any of that.

Merry Christmas, and keep up the good work!

sdcondon commented 7 months ago

My apologies - somehow missed the open PR, which is a better place for the comment above.

cebtenzzre commented 3 weeks ago

The C# binding has been removed as it has no maintainer. See #2429.