ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
94.39k stars 7.47k forks source link

Llama3.2-vision Run Error #7300

Open mruckman1 opened 1 day ago

mruckman1 commented 1 day ago

What is the issue?

  1. Updated Ollama this morning.
  2. Entered ollama run x/llama3.2-vision on macbook
  3. Got below output:

pulling manifest pulling 652e85aa1e14... 100% ▕████████████████▏ 6.0 GB
pulling 622429e8d318... 100% ▕████████████████▏ 1.9 GB
pulling 962e0f69a367... 100% ▕████████████████▏ 163 B
pulling dc49c86b8ebb... 100% ▕████████████████▏ 30 B
pulling 6a50468ba2a8... 100% ▕████████████████▏ 498 B
verifying sha256 digest writing manifest success > Error: llama runner process has terminated: error:Missing required key: clip.has_text_encoder

Expected: Ollama download without error.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.14

rick-github commented 1 day ago

Vision support was merged recently (https://github.com/ollama/ollama/pull/6963), 0.3.14 doesn't include it.

silasalves commented 1 day ago

What does "vision support" mean? Does it enabling "submitting multiple images for inference" or "video inference"? Or is it just the support for this particular model?

AFAIK, video or multiple images are still an open issue #3184

rick-github commented 1 day ago

Vision support for llama3.2. llama3.2 doesn't do video, and doesn't work reliably with multiple images.

pavan-otthi123 commented 17 hours ago

Does this mean that llama3.2-vision can't be used in the current version of Ollama?

I'm also getting the same error when attempting to run the model

rick-github commented 14 hours ago

Version 0.4.0 will support llama3.2-vision.

Animaxx commented 6 hours ago

Thank you for the hard work, could we also this change to Llama.cpp repo as well? How can we convert the model from HF to GGUF with llama vision structure?

silasalves commented 4 hours ago

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

jessegross commented 4 hours ago

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

It can run on the GPU but it needs more RAM than the text-only versions, so it has likely exceed the limit of your GPU.

rick-github commented 4 hours ago

It should run on GPU if it fits:

$ ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL   
x/llama3.2-vision:latest        25e973636a29    11 GB   100% GPU        Forever

If you can provide server logs perhaps we can see why it's not working for you.

silasalves commented 3 hours ago

@jessegross Thanks for pointing that out. That sounds correct, my GPU is quite old and has only 4GB RAM.

@rick-github Thanks for the support, this is my server.log https://gist.github.com/silasalves/f2bdfc195618f19ecd557b945cab32b9

I think this is the important part?

time=2024-10-22T14:22:10.644-04:00 level=INFO source=llama-server.go:72 msg="system memory" total="31.9 GiB" free="13.6 GiB" free_swap="19.0 GiB"
time=2024-10-22T14:22:10.649-04:00 level=INFO source=memory.go:346 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[4.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.9 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="5.2 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="213.3 MiB" memory.graph.partial="213.3 MiB"
rick-github commented 3 hours ago

Yep, too big for your card.