Closed Propheticus closed 3 months ago
Sorry started out as a bug (llama 3 not working via server), but chose a more positive approach... That did not change the label though 😞
Right, so I found that you can actually specify the stop token in the API call below the messages array.
"stop": ["<|eot_id|>"]
Too bad not all apps that use restful (Open AI) API calls allow this to be set.
Yes, it is quite confusing right now. We will work on the API server to have it directly communicate with model.json, so it can be set by default, and the chat/completion
message settings can override this.
Will be addressed when Jan x Integration is done @Van-QA
We are working on a new version of Jan x Cortex. So it's recommended to try it via nightly build https://github.com/janhq/jan?tab=readme-ov-file#download. Feel free to get ‌back to us if the issue remains.
To properly run Llama 3 models, you need to set stop token
<|eot_id|>
. This is currently not configurable when running Jan in API server mode. The model is automatically loaded by llama.cpp withThis causes the model to not stop generating when it should. It places
<|eot_id|>assistant\n\n
in its output and continues generating several responses/turns.Of course a fix for llama.cpp is already being made, and surely coming to
NitroCortex. Still having this configurable would be nice.