Closed d-kleine closed 2 days ago
Oh wow, thanks so much for figuring this out. I tried lots of things but somehow didn't think of this. It's kind of weird that Ollama doesn't error if the options are passed differently (but then silently ignores them). In any case, I can confirm that the responses are now deterministic. But it still seems they are not deterministic across operating systems (but that's ok).
Oh wow, thanks so much for figuring this out. I tried lots of things but somehow didn't think of this.
Tbh, I am really happy that the model is deterministic now, so the same evaluation scores also differ less than before 🙂
It's kind of weird that Ollama doesn't error if the options are passed differently (but then silently ignores them).
Yeah, I was thinking the same...
In any case, I can confirm that the responses are now deterministic. But it still seems they are not deterministic across operating systems (but that's ok).
Yeah, I can confirm that. I have tested it with Windows 10 and with my Ubuntu image on Docker, the generated output on the same OS is deterministic and reproducible, but across different OS it is inconsistent. This also seems to when restarting the kernel. My assumption is that this is not an issue of the model itself, but rather one in Ollama (probably even llama.cpp in the backend).
I have opened an GH issue on this: https://github.com/ollama/ollama/issues/5321
Thanks for updating the code!
Bug description
@rasbt I think I have found the issue why the Ollama API does not generate deterministic output, this change in the code should solve it:
I have taken a look into the Ollama API docs, it seems like you need to pass those params into a separate
options
key in the json input.It's important to set
"num_ctx"
(number of tokens for the context window) too because this will make sure that the output will be 100% reproducible, otherwise it will be slightly random. I have also added"num_ctx": 2048
for a fixed context window size according the the model params docs - the output of this code should be fully reproducible:My output is deterministic, so should be reproducible for you too:
And later in the notebook, the evaluation scores should be of course then reproducible too:
What operating system are you using?
Windows
Where do you run your code?
Local (laptop, desktop)
Environment