Open LrsChrSch opened 3 months ago
Small update here:
Using model revision '2024-05-08' to train a model with this same workflow works perfectly fine. It seems like ollama cannot handle the newer revisions of this model.
The change we made to support higher resolution images hasn't been ported to llama.cpp/ollama yet - https://github.com/vikhyat/moondream/commit/ffbf8228aca7138fb55cee2119237d433f8431e2
@LrsChrSch Can you push the model to ollama ? because the latest version over there doesn't seem to work
Hey there! Had an issue a few months ago about not being able to quantize the model in order to run it using ollama.
Figured it out back then and all was good.
Today I wanted to train a different model with the 'new' (3 months old) finetuning script and the new model revision. Finetuning worked like a charm. Got a model out of it which I was happy with.
Now to add it to ollama: I used the same approach as last time, which is create-gguf.py using the model.safetensors and the path to the directory of the tokenizer (git cloned the huggingface repo and pointed it there).
The resulting ggufs also look fine to me. One is about 910kb and the other 2770kb.
My Modelfile now looks like this (didn't want to quantize to Q4_0 directly and also saw that ollama also has a --quantize feature built in)
Using
ollama create [modelname]
with this works just fine.Running the model using
ollama run [modelname]
prints the errorError: llama runner process has terminated: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
Displaying the model info using
ollama show [modelname]
shows this:This is different to the official moondream:v2 Modelfile.
Notice the different parameter sizes. It seems like Ollama reads the same model in twice for both the projector and the text model.
I don't know if this is an issue with create-gguf (because it does seem to do what it's supposed to) or if it's an issue with ollama.
Hope you can help and thank you so much in advance! If you need more info, i'll be happy to help. I'll also try poking around a bit more and see if I can find the issue myself.