Open mruckman1 opened 1 day ago
Vision support was merged recently (https://github.com/ollama/ollama/pull/6963), 0.3.14 doesn't include it.
What does "vision support" mean? Does it enabling "submitting multiple images for inference" or "video inference"? Or is it just the support for this particular model?
AFAIK, video or multiple images are still an open issue #3184
Vision support for llama3.2. llama3.2 doesn't do video, and doesn't work reliably with multiple images.
Does this mean that llama3.2-vision can't be used in the current version of Ollama?
I'm also getting the same error when attempting to run the model
Version 0.4.0 will support llama3.2-vision.
Thank you for the hard work, could we also this change to Llama.cpp repo as well? How can we convert the model from HF to GGUF with llama vision structure?
@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.
@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.
It can run on the GPU but it needs more RAM than the text-only versions, so it has likely exceed the limit of your GPU.
It should run on GPU if it fits:
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
x/llama3.2-vision:latest 25e973636a29 11 GB 100% GPU Forever
If you can provide server logs perhaps we can see why it's not working for you.
@jessegross Thanks for pointing that out. That sounds correct, my GPU is quite old and has only 4GB RAM.
@rick-github Thanks for the support, this is my server.log https://gist.github.com/silasalves/f2bdfc195618f19ecd557b945cab32b9
I think this is the important part?
time=2024-10-22T14:22:10.644-04:00 level=INFO source=llama-server.go:72 msg="system memory" total="31.9 GiB" free="13.6 GiB" free_swap="19.0 GiB"
time=2024-10-22T14:22:10.649-04:00 level=INFO source=memory.go:346 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[4.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.9 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="5.2 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="213.3 MiB" memory.graph.partial="213.3 MiB"
Yep, too big for your card.
What is the issue?
ollama run x/llama3.2-vision
on macbookExpected: Ollama download without error.
OS
macOS
GPU
Apple
CPU
Apple
Ollama version
0.3.14