vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
5.58k stars 463 forks source link

cannot run moondream in Ollama (ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 ") #89

Closed prithvi151080 closed 6 months ago

prithvi151080 commented 6 months ago

I have downloaded the moondream model from official ollama site (https://ollama.com/library/moondream) but while running the model in ollama i get this error ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "

image

Below is the entire ollama server log file: time=2024-04-29T11:07:39.775+05:30 level=INFO source=images.go:817 msg="total blobs: 6" time=2024-04-29T11:07:39.778+05:30 level=INFO source=images.go:824 msg="total unused blobs removed: 0" time=2024-04-29T11:07:39.779+05:30 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)" time=2024-04-29T11:07:39.791+05:30 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners time=2024-04-29T11:07:40.021+05:30 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11.3 rocm_v5.7 cpu cpuavx]" [GIN] 2024/04/29 - 11:07:40 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/04/29 - 11:07:40 | 200 | 2.1732ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/04/29 - 11:07:40 | 200 | 1.6291ms | 127.0.0.1 | POST "/api/show" time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64.dll" time=2024-04-29T11:07:40.869+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]" time=2024-04-29T11:07:41.992+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-29T11:07:41.993+05:30 level=INFO source=cpucommon.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.065+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6" time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64.dll" time=2024-04-29T11:07:42.092+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]" time=2024-04-29T11:07:42.093+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-29T11:07:42.093+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.142+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6" time=2024-04-29T11:07:42.168+05:30 level=INFO source=server.go:127 msg="offload to gpu" reallayers=25 layers=25 required="2588.9 MiB" used="2588.9 MiB" available="3304.2 MiB" kv="384.0 MiB" fulloffload="148.0 MiB" partialoffload="190.0 MiB" time=2024-04-29T11:07:42.168+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.177+05:30 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\LENOVO\.ollama\models\blobs\sha256-e554c6b9de016673fd2c732e0342967727e9659ca5f853a4947cc96263fa602b --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --mmproj C:\Users\LENOVO\.ollama\models\blobs\sha256-4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f --port 53596" time=2024-04-29T11:07:42.229+05:30 level=INFO source=server.go:389 msg="waiting for llama runner to start responding" {"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"21912","timestamp":1714369062} {"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"21912","timestamp":1714369062} {"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"21912","timestamp":1714369062,"total_threads":16} {"function":"load_model","level":"INFO","line":395,"msg":"Multi Modal Mode Enabled","tid":"21912","timestamp":1714369062} ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes key clip.vision.image_grid_pinpoints not found in file key clip.vision.mm_patch_merge_type not found in file key clip.vision.image_crop_resolution not found in file clip_model_load: failed to load vision model tensors time=2024-04-29T11:07:43.382+05:30 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "

I am able to run other models in ollama like phi3, mxbai, tinyllama etc without any issue but Moondream is failing to start.

OS details: Windows GPU: Nvidia RTX 3050

Any help would be highly appreciated.

prithvi151080 commented 6 months ago

i had to update ollama to 0.1.33 and the problem resolved