Closed mphacker closed 3 weeks ago
I'm going to take a look at this. But I think the problem is that with the Hermes model the underlying prompt context should be reused. Actually a new context is created for every text prediction request.
But I think the problem is that with the Hermes model the underlying prompt context should be reused.
Likely. Python bindings had to be fixed in that way: #969
Just FYI, I had the same issue when using GPT4All-13B-snoozy.ggmlv3.q4_0.bin. I also cloned the git repository, built it using mingw, compiled the C# bindings and did some tests.
llama_eval_internal: first token must be BOS llama_eval: failed to eval LLaMA ERROR: Failed to process prompt
result = await model.GetStreamingPredictionAsync( systemPrompt + " " + prompt2, opts);
Thanks for sharing. I'm looking fix this (and to revise and add a proper chat API next).
While a fix for this is available, a workaround (tested on hermes) is to not let the past conversation token number be reset to 0 after the first run, e.g:
result = await model.GetStreamingPredictionAsync(
systemPrompt + " " + prompt2,
opts with
{
PastConversationTokensNum = 1 // the default is zero
});
Note that this a workaround an may not get the best result.
Update: Here's the branch where I'm working out the fix if you'd like to try it before it goes upstream csharp/issue/996-second-prompt-error
Hi there!
Heh, typical that I'd only find this after figuring it out for myself (though my issue was just that the convo was essentially memory-less rather than an actual exception).
However, re that branch, are you absolutely sure you want to expose that low-level type as part of this high-level API? Feels like the design of this lib so far is that all of the stuff in Bindings is encapsulated and hidden away by the higher-level stuff. You'd be breaking that with this change. IMO PredictRequestOptions does serve a purpose - it's just that it should probably be scoped to a Gpt4All (or something higher-level, anyway - don't know if you've considered something like adding something along the lines of ITextPrediction IGpt4AllModel.CreatePredictionContext(opts)
) rather than an individual prediction request?
I've been playing with this for the last day or so - and have got a fork that does this and makes a fair number of other fixes and tweaks besides - see here - https://github.com/sdcondon/gpt4all/tree/csharp-fixes (NB the single commit with a very detailed commit message - though also note the base for this is the head of the csharp-gguf branch, not main). Works when built into a package, with no consumer action required to find the implementations. I've also changed the sample app to allow for an extended convo. Happy to get involved if you're interested in any of that.
Merry Christmas, and keep up the good work!
My apologies - somehow missed the open PR, which is a better place for the comment above.
The C# binding has been removed as it has no maintainer. See #2429.
System Info
Using the C# binding based off of the GPT4ALL code cloned and compiled on 6/15/2023.
Information
Related Components
Reproduction
The first prediction is completed without issues. The 2nd prediction is running into the error:
llama_eval_internal: first token must be BOS llama_eval: failed to eval LLaMA ERROR: Failed to process prompt
Expected behavior
A response is successfully generated for both prompts.