ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
97.8k stars 7.78k forks source link

Error: no suitable llama servers found - with full /tmp #6985

Closed Dalibor-P closed 1 month ago

Dalibor-P commented 1 month ago

What is the issue?

Running ollama run smollm:135m, or any other model, results in: Error: no suitable llama servers found.

I'm running Fedora Linux, previously Ollama 0.3.4, which worked. I updated to 0.3.12 using curl -fsSL https://ollama.com/install.sh | sh, after which the error started showing up. I tried:

  1. Uninstalling Ollama as per instructions here: https://github.com/ollama/ollama/blob/main/docs/linux.md.
  2. Reinstalling 0.3.12.
  3. Uninstalling Ollama again.
  4. Installing version 0.3.11.
  5. Uninstalling Ollama again.
  6. Installing version 0.3.4, which is so far the only version which worked.
  7. Updating to 0.3.12 again.

This is the log obtained by running journalctl -e -u ollama while on version 0.3.12. I can't read it, but I did notice the no space left on the device error, which is certainly incorrect, as I still have over a 100 GB on my drive.

Sep 26 18:06:21 pc-186.home systemd[1]: Started ollama.service - Ollama Service.
Sep 26 18:06:21 pc-186.home ollama[3886]: Couldn't find '/usr/share/ollama/.ollama/id_ed25519'. Generating new private key.
Sep 26 18:06:21 pc-186.home ollama[3886]: Your new public key is:
Sep 26 18:06:21 pc-186.home ollama[3886]: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINqSHircCEpRreiUVAPOAMJD3guwm3DiQsS3KajMHF9I
Sep 26 18:06:21 pc-186.home ollama[3886]: 2024/09/26 18:06:21 routes.go:1108: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Sep 26 18:06:21 pc-186.home ollama[3886]: time=2024-09-26T18:06:21.408+02:00 level=INFO source=images.go:781 msg="total blobs: 0"
Sep 26 18:06:21 pc-186.home ollama[3886]: time=2024-09-26T18:06:21.409+02:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Sep 26 18:06:21 pc-186.home ollama[3886]: time=2024-09-26T18:06:21.409+02:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.4)"
Sep 26 18:06:21 pc-186.home ollama[3886]: time=2024-09-26T18:06:21.410+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3058390423/runners
Sep 26 18:06:40 pc-186.home ollama[3886]: time=2024-09-26T18:06:40.395+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
Sep 26 18:06:40 pc-186.home ollama[3886]: time=2024-09-26T18:06:40.406+02:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
Sep 26 18:06:40 pc-186.home ollama[3886]: time=2024-09-26T18:06:40.456+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 26 18:06:40 pc-186.home ollama[3886]: time=2024-09-26T18:06:40.456+02:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="3.7 GiB" available="508.2 MiB"
Sep 26 18:06:40 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:06:40 | 200 |     148.487µs |       127.0.0.1 | GET      "/api/version"
Sep 26 18:07:05 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:07:05 | 200 |     949.948µs |       127.0.0.1 | HEAD     "/"
Sep 26 18:07:05 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:07:05 | 404 |    1.936316ms |       127.0.0.1 | POST     "/api/show"
Sep 26 18:07:07 pc-186.home ollama[3886]: time=2024-09-26T18:07:07.522+02:00 level=INFO source=download.go:175 msg="downloading eb2c714d40d4 in 1 91 MB part(s)"
Sep 26 18:08:21 pc-186.home ollama[3886]: time=2024-09-26T18:08:21.923+02:00 level=INFO source=download.go:175 msg="downloading 62fbfd9ed093 in 1 182 B part(s)"
Sep 26 18:08:24 pc-186.home ollama[3886]: time=2024-09-26T18:08:24.658+02:00 level=INFO source=download.go:175 msg="downloading cfc7749b96f6 in 1 11 KB part(s)"
Sep 26 18:08:27 pc-186.home ollama[3886]: time=2024-09-26T18:08:27.118+02:00 level=INFO source=download.go:175 msg="downloading ca7a9654b546 in 1 89 B part(s)"
Sep 26 18:08:29 pc-186.home ollama[3886]: time=2024-09-26T18:08:29.550+02:00 level=INFO source=download.go:175 msg="downloading f590523c855b in 1 488 B part(s)"
Sep 26 18:08:31 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:08:31 | 200 |         1m26s |       127.0.0.1 | POST     "/api/pull"
Sep 26 18:08:31 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:08:31 | 200 |   26.391589ms |       127.0.0.1 | POST     "/api/show"
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.320+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=31 layers.offload=0 layers.split="" memory.available="[355.6 MiB]" memory.required.full="442.1 MiB" memory.required.partial="0 B" memory.required.kv="180.0 MiB" memory.required.allocations="[350.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB"
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.325+02:00 level=INFO source=server.go:392 msg="starting llama server" cmd="/tmp/ollama3058390423/runners/cpu_avx2/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 46259"
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.326+02:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.326+02:00 level=INFO source=server.go:592 msg="waiting for llama runner to start responding"
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.327+02:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server error"
Sep 26 18:08:31 pc-186.home ollama[4066]: INFO [main] build info | build=1 commit="1e6f655" tid="140182396393344" timestamp=1727366911
Sep 26 18:08:31 pc-186.home ollama[4066]: INFO [main] system info | n_threads=2 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140182396393344" timestamp=1727366911 total_threads=4
Sep 26 18:08:31 pc-186.home ollama[4066]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="46259" tid="140182396393344" timestamp=1727366911
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: loaded meta data with 39 key-value pairs and 272 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 (version GGUF V3 (latest))
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   1:                               general.type str              = model
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   2:                               general.name str              = SmolLM 135M
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   3:                       general.organization str              = HuggingFaceTB
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   4:                           general.finetune str              = Instruct
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   5:                           general.basename str              = SmolLM
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   6:                         general.size_label str              = 135M
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   7:                            general.license str              = apache-2.0
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = SmolLM 135M
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = HuggingFaceTB
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/HuggingFaceTB/...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  12:                               general.tags arr[str,3]       = ["alignment-handbook", "trl", "sft"]
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  14:                           general.datasets arr[str,4]       = ["Magpie-Align/Magpie-Pro-300K-Filter...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  15:                          llama.block_count u32              = 30
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  16:                       llama.context_length u32              = 2048
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  17:                     llama.embedding_length u32              = 576
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  18:                  llama.feed_forward_length u32              = 1536
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  19:                 llama.attention.head_count u32              = 9
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  20:              llama.attention.head_count_kv u32              = 3
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  21:                       llama.rope.freq_base f32              = 10000.000000
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  22:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  23:                          general.file_type u32              = 2
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  24:                           llama.vocab_size u32              = 49152
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  25:                 llama.rope.dimension_count u32              = 64
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  26:            tokenizer.ggml.add_space_prefix bool             = false
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  27:               tokenizer.ggml.add_bos_token bool             = false
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = smollm
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<|im_start|>", "<|...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,48900]   = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 2
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 2
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - kv  38:               general.quantization_version u32              = 2
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - type  f32:   61 tensors
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - type q4_0:  210 tensors
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_model_loader: - type q8_0:    1 tensors
Sep 26 18:08:31 pc-186.home ollama[3886]: time=2024-09-26T18:08:31.579+02:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server loading model"
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_vocab: special tokens cache size = 17
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_vocab: token to piece cache size = 0.3170 MB
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: format           = GGUF V3 (latest)
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: arch             = llama
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: vocab type       = BPE
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_vocab          = 49152
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_merges         = 48900
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: vocab_only       = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_ctx_train      = 2048
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_embd           = 576
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_layer          = 30
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_head           = 9
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_head_kv        = 3
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_rot            = 64
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_swa            = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_embd_head_k    = 64
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_embd_head_v    = 64
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_gqa            = 3
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_embd_k_gqa     = 192
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_embd_v_gqa     = 192
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_ff             = 1536
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_expert         = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_expert_used    = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: causal attn      = 1
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: pooling type     = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: rope type        = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: rope scaling     = linear
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: freq_base_train  = 10000.0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: freq_scale_train = 1
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: n_ctx_orig_yarn  = 2048
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: rope_finetuned   = unknown
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: ssm_d_conv       = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: ssm_d_inner      = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: ssm_d_state      = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: ssm_dt_rank      = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: model type       = ?B
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: model ftype      = Q4_0
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: model params     = 134.52 M
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: model size       = 85.77 MiB (5.35 BPW)
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: general.name     = SmolLM 135M
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: BOS token        = 1 '<|im_start|>'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: EOS token        = 2 '<|im_end|>'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: PAD token        = 2 '<|im_end|>'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: LF token         = 143 'Ä'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: EOT token        = 0 '<|endoftext|>'
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_print_meta: max token length = 162
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_tensors: ggml ctx size =    0.13 MiB
Sep 26 18:08:31 pc-186.home ollama[3886]: llm_load_tensors:        CPU buffer size =   114.46 MiB
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: n_ctx      = 8192
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: n_batch    = 512
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: n_ubatch   = 512
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: flash_attn = 0
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: freq_base  = 10000.0
Sep 26 18:08:31 pc-186.home ollama[3886]: llama_new_context_with_model: freq_scale = 1
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_kv_cache_init:        CPU KV buffer size =   180.00 MiB
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_new_context_with_model: KV self size  =  180.00 MiB, K (f16):   90.00 MiB, V (f16):   90.00 MiB
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_new_context_with_model:        CPU  output buffer size =     0.76 MiB
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_new_context_with_model:        CPU compute buffer size =   164.51 MiB
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_new_context_with_model: graph nodes  = 966
Sep 26 18:08:32 pc-186.home ollama[3886]: llama_new_context_with_model: graph splits = 1
Sep 26 18:08:32 pc-186.home ollama[4066]: INFO [main] model loaded | tid="140182396393344" timestamp=1727366912
Sep 26 18:08:32 pc-186.home ollama[3886]: time=2024-09-26T18:08:32.336+02:00 level=INFO source=server.go:631 msg="llama runner started in 1.01 seconds"
Sep 26 18:08:32 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:08:32 | 200 |  1.050238178s |       127.0.0.1 | POST     "/api/chat"
Sep 26 18:08:48 pc-186.home ollama[3886]: [GIN] 2024/09/26 - 18:08:48 | 200 |  749.519122ms |       127.0.0.1 | POST     "/api/chat"
Sep 26 18:27:56 pc-186.home systemd[1]: Stopping ollama.service - Ollama Service...
Sep 26 18:27:56 pc-186.home systemd[1]: ollama.service: Deactivated successfully.
Sep 26 18:27:56 pc-186.home systemd[1]: Stopped ollama.service - Ollama Service.
Sep 26 18:27:56 pc-186.home systemd[1]: ollama.service: Consumed 41.756s CPU time.
Sep 26 18:27:56 pc-186.home systemd[1]: Started ollama.service - Ollama Service.
Sep 26 18:27:56 pc-186.home ollama[5626]: 2024/09/26 18:27:56 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Sep 26 18:27:56 pc-186.home ollama[5626]: time=2024-09-26T18:27:56.867+02:00 level=INFO source=images.go:753 msg="total blobs: 5"
Sep 26 18:27:56 pc-186.home ollama[5626]: time=2024-09-26T18:27:56.868+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
Sep 26 18:27:56 pc-186.home ollama[5626]: time=2024-09-26T18:27:56.869+02:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.3.12)"
Sep 26 18:27:56 pc-186.home ollama[5626]: time=2024-09-26T18:27:56.870+02:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama3893034885/runners
Sep 26 18:28:16 pc-186.home ollama[5626]: time=2024-09-26T18:28:16.153+02:00 level=ERROR source=common.go:214 msg="failed to extract files" error="copy payload linux/amd64/cuda_v12/libggml.so: write /tmp/ollama3893034885/runners/cuda_v12/libggml.so: no space left on device"
Sep 26 18:28:16 pc-186.home ollama[5626]: time=2024-09-26T18:28:16.919+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[]
Sep 26 18:28:16 pc-186.home ollama[5626]: time=2024-09-26T18:28:16.933+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
Sep 26 18:28:17 pc-186.home ollama[5626]: time=2024-09-26T18:28:17.066+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 26 18:28:17 pc-186.home ollama[5626]: time=2024-09-26T18:28:17.066+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="3.7 GiB" available="1.3 GiB"
Sep 26 18:28:17 pc-186.home ollama[5626]: [GIN] 2024/09/26 - 18:28:17 | 200 |   50.394948ms |       127.0.0.1 | GET      "/api/version"
Sep 26 18:28:21 pc-186.home ollama[5626]: [GIN] 2024/09/26 - 18:28:21 | 200 |      41.696µs |       127.0.0.1 | HEAD     "/"
Sep 26 18:28:21 pc-186.home ollama[5626]: [GIN] 2024/09/26 - 18:28:21 | 200 |    38.24683ms |       127.0.0.1 | POST     "/api/show"
Sep 26 18:28:21 pc-186.home ollama[5626]: time=2024-09-26T18:28:21.118+02:00 level=INFO source=server.go:103 msg="system memory" total="3.7 GiB" free="1.4 GiB" free_swap="17.3 GiB"
Sep 26 18:28:21 pc-186.home ollama[5626]: time=2024-09-26T18:28:21.119+02:00 level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=31 layers.offload=0 layers.split="" memory.available="[1.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="438.2 MiB" memory.required.partial="0 B" memory.required.kv="180.0 MiB" memory.required.allocations="[438.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB"
Sep 26 18:28:45 pc-186.home ollama[5626]: time=2024-09-26T18:28:45.591+02:00 level=ERROR source=common.go:214 msg="failed to extract files" error="copy payload linux/amd64/cuda_v12/libggml.so: write /tmp/ollama3893034885/runners/cuda_v12/libggml.so: no space left on device"
Sep 26 18:28:46 pc-186.home ollama[5626]: time=2024-09-26T18:28:46.496+02:00 level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 error="no suitable llama servers found"
Sep 26 18:28:46 pc-186.home ollama[5626]: [GIN] 2024/09/26 - 18:28:46 | 500 |  25.42173315s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.3.12

dhiltgen commented 1 month ago

no space left on device

You appear to have run out of storage space in /tmp/

If you can't clean up sufficient space in /tmp for Ollama to start up cleanly, you can use OLLAMA_TMPDIR to specify an alternate location for extraction of temporary files.

dtischler commented 1 month ago

I ran into this issue as well, and resolved it with a quick:

mount -o remount,size=3G /tmp
hast0011 commented 1 month ago

no space left on device

You appear to have run out of storage space in /tmp/

If you can't clean up sufficient space in /tmp for Ollama to start up cleanly, you can use OLLAMA_TMPDIR to specify an alternate location for extraction of temporary files.

Can you please help how to do that if ollama runs as service using the provided install script? I managed to use "export OLLAMA_TMPDIR=new_path" and run manually "ollama serve" but for the service that's probably not useful. How to setup that?

dhiltgen commented 4 weeks ago

@hast0011 docs on how to configure the linux server with systemd are here - https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux