Closed ElMouse closed 6 months ago
OK it has to be something in pyllmodel.py on the callback stuff, but it looks like rocket science for me for now.
OK, i found a solution
changing
output = model.generate("prompt", max_tokens=300 )
print (output)
to
with model.chat_session():
model.generate("prompt", max_tokens=300)
output = model.current_chat_session[2]['content']
print (output)
resolves the issue, and i get output every time. Not closing the issue, as i suppose it was intended that output = model.generate("prompt", max_tokens=300 ) Should also work normally?
@ElMouse You're a real one for troubleshooting this. 19 hours before i needed it as well. Thanks so much!!!
This is not unexpected - note that the parameter is "max_tokens", not "tokens". If an LLM decides that the output should end, it sends the EOS token before max_tokens is hit. A chat session is usually what you want - and you should be able to just take the output from the generate call when it is wrapped, without touching model.current_chat_session.
System Info
Windows 11, Python 310, GPT4All Python Generation API
Information
Reproduction
Using GPT4All Python Generation API. I am facing a strange behavior, for which i cant find explanation and it is really frustrating. Seems like i am missing some obvious thing and i feel like an idiot.
Just simply using
sometimes produces empty result. like literally output=''
I looked into gpt4all code and it seems that
is not called... or works... every time. The tokens sometomes are just not produced. And by not produced i mean not that the generation works indefinatly. Just output = model.generate("prompt", max_tokens=300 ) works for some time and just produces no tokens and empty result. During this it works for about the same amount if time when it IS producing the answer.
Whats even more weird is that if i loop output = model.generate("prompt", max_tokens=300 )
it can produce the result from the first try, or sometimes from the 3 or 4th try. Or sometimes not even from the 100th time. With the same prompt.
I seems i am missing something. Tried different parameters, different prompts, different models.
For the life of me i cant figure out the logic - when it works and when it doesnt.
At the same time the desktop version produces answers every time all the time, without problems, with the same models and the same prompts.
And output = model.generate("prompt", max_tokens=300 ) doesn't produce any logs... even with verbose mode. I am really puzzled...
Any pointers are appreciated. I am desperate here. Thanks in advance.
Expected behavior
at least some answer is produced every time the output = model.generate("prompt", max_tokens=300 ) is executed.