model: Mistral Nemo - Githubissues

offgridtech commented 2 months ago

[ ] I have searched the existing issues

Current behavior

I see a bunch of stuff on HuggingFace and llama.cpp Git about pre-tokenizers causing issues upon initial release of the quantizied Mistal Nemo model, but it seemed everything was cleared up over the last few days due to a llama.cpp update. What worked for other people didn't work for Jan. I've tried several quant versions, and it fails to start. Saw KoboldCPP and LMStudio say they made some updates, and it's fixed now. I'm guessing you all need to do the same. Thanks

More information here: https://github.com/ggerganov/llama.cpp/pull/8579 https://github.com/ggerganov/llama.cpp/pull/8604

Minimum reproduction step

It doesn't start. Other models like llama 3.1 start fine.

Expected behavior

The model starts

Screenshots / Logs

This log looks like it is the pre-tokenizer issue they were talking about.

Jan version

v0.5.2

In which operating systems have you tested?

[ ] macOS
[ ] Windows
[X] Linux

Environment details

AppImage on Linux

dan-homebrew commented 1 month ago

@nguyenhoangthuan99 Can you look into this:

Is this the Tekken tokenizer?
This would need to be refactored into tokenizer.cpp?
I've scheduled for this sprint: scope is to just investigate and articulate what long-term path is
However: if there's a fast solution, we should go for it

nguyenhoangthuan99 commented 1 month ago

Mistral Nemo can be supported by cortex.llamacpp engine now. I tested with current source of llamacpp and it can load and answer question correctly
Next steps:
- Create model hub mistral nemo and upload model
- Integrate with cortex, investigate chat template, stop token,...

dan-homebrew commented 3 weeks ago

@offgridtech I am transferring this issue to cortex.cpp repo. We should be working on it, ETA 2 weeks

nguyenhoangthuan99 commented 1 week ago

I think Mistral Nemo is the first model for us to do this pipeline automatically. To add new model support from hugging face

Create a model repo mistral-nemo under cortexso, cc @0xSage @dan-homebrew for helping me to create, my account doesn't have permission to do so
Prepare ReadMe.md, model.yml for this model arch
Run the CI with this instruction to automatically, pull, convert and quantize model with different quantization levels.

nguyenhoangthuan99 commented 1 week ago

Mistral-nemo is supported now at cortexso. 10 quantization levels are available now. All models are created and uploaded automatically through CI.

Can try mistral-nemo with cortex-nightly

janhq / models

model: Mistral Nemo #19