janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.84k stars 1.39k forks source link

feat: Configurable EOS token when using the server #2758

Closed Propheticus closed 3 months ago

Propheticus commented 7 months ago

To properly run Llama 3 models, you need to set stop token <|eot_id|>. This is currently not configurable when running Jan in API server mode. The model is automatically loaded by llama.cpp with

llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'

This causes the model to not stop generating when it should. It places <|eot_id|>assistant\n\n in its output and continues generating several responses/turns.

Of course a fix for llama.cpp is already being made, and surely coming to Nitro Cortex. Still having this configurable would be nice.

Propheticus commented 7 months ago

Sorry started out as a bug (llama 3 not working via server), but chose a more positive approach... That did not change the label though 😞

Propheticus commented 7 months ago

Right, so I found that you can actually specify the stop token in the API call below the messages array. "stop": ["<|eot_id|>"] Too bad not all apps that use restful (Open AI) API calls allow this to be set.

louis-jan commented 7 months ago

Yes, it is quite confusing right now. We will work on the API server to have it directly communicate with model.json, so it can be set by default, and the chat/completion message settings can override this.

louis-jan commented 4 months ago

Will be addressed when Jan x Integration is done @Van-QA

Van-QA commented 3 months ago

We are working on a new version of Jan x Cortex. So it's recommended to try it via nightly build https://github.com/janhq/jan?tab=readme-ov-file#download. Feel free to get ‌back to us if the issue remains.