intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.71k stars 1.26k forks source link

cant run ollama using llm-cpp on 12th igpu under linux #12120

Closed user7z closed 1 month ago

user7z commented 1 month ago
i start serving with this script :
bash

export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 source /opt/intel/oneapi/setvars.sh ./ollama serve

end
here is serve log :

`:: initializing oneAPI environment ... ollama-lunch: BASH_VERSION = 5.2.32(1)-release args: Using "$@" for setvars.sh arguments: :: advisor -- latest :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: tbb -- latest :: vtune -- latest :: oneAPI environment initialized ::

2024/09/25 15:24:58 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost: https://localhost: http://127.0.0.1 https://127.0.0.1 http://127.0.0.1: https://127.0.0.1: http://0.0.0.0 https://0.0.0.0 http://0.0.0.0: https://0.0.0.0: app:// file:// tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-09-25T15:24:58.093Z level=INFO source=images.go:753 msg="total blobs: 5" time=2024-09-25T15:24:58.093Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(Server).PullModelHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(Server).CreateModelHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(Server).PushModelHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(Server).CopyModelHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(Server).DeleteModelHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(Server).ShowModelHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(Server).ProcessHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(Server).ShowModelHandler-fm (6 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) time=2024-09-25T15:24:58.093Z level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11434 (version 0.3.6-ipexllm-20240925)" time=2024-09-25T15:24:58.099Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama719758347/runners time=2024-09-25T15:24:58.232Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]"`

wheny i try to : ollama run llama3.1:latest

i get am error with this log :

`[GIN] 2024/09/25 - 15:28:23 200 47.421µs 127.0.0.1 HEAD "/" [GIN] 2024/09/25 - 15:28:23 200 22.833375ms 127.0.0.1 POST "/api/show" time=2024-09-25T15:28:23.254Z level=INFO source=gpu.go:168 msg="looking for compatible GPUs" time=2024-09-25T15:28:23.254Z level=WARN source=gpu.go:560 msg="unable to locate gpu dependency libraries" time=2024-09-25T15:28:23.254Z level=WARN source=gpu.go:560 msg="unable to locate gpu dependency libraries" time=2024-09-25T15:28:23.274Z level=WARN source=gpu.go:560 msg="unable to locate gpu dependency libraries" time=2024-09-25T15:28:23.283Z level=INFO source=gpu.go:280 msg="no compatible GPUs were discovered" time=2024-09-25T15:28:23.330Z level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[25.1 GiB]" memory.required.full="5.8 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-09-25T15:28:23.332Z level=INFO source=server.go:395 msg="starting llama server" cmd="/tmp/ollama719758347/runners/cpu_avx2/ollama_llama_server --model /home/user/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 999 --no-mmap --parallel 4 --port 43871" time=2024-09-25T15:28:23.334Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2024-09-25T15:28:23.334Z level=INFO source=server.go:595 msg="waiting for llama runner to start responding" time=2024-09-25T15:28:23.334Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error" INFO [main] build info build=1 commit="7cec8b8" tid="137492451709312" timestamp=1727278103 INFO [main] system info n_threads=2 n_threads_batch=-1 system_info="AVX = 0 AVX_VNNI = 0 AVX2 = 0 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 AVX512_BF16 = 0 FMA = 0 NEON = 0 SVE = 0 ARM_FMA = 0 F16C = 0 FP16_VA = 0 WASM_SIMD = 0 BLAS = 1 SSE3 = 0 SSSE3 = 0 VSX = 0 MATMUL_INT8 = 0 LLAMAFILE = 1 " tid="137492451709312" timestamp=1727278103 total_threads=12 INFO [main] HTTP server listening hostname="127.0.0.1" n_threads_http="11" port="43871" tid="137492451709312" timestamp=1727278103 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /home/user/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-09-25T15:28:23.586Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '< begin_of_text >' llm_load_print_meta: EOS token = 128009 '< eot_id >' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '< eot_id >' llm_load_print_meta: max token length = 256 ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1 ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no ggml_sycl_init: SYCL_USE_XMX: yes ggml_sycl_init: found 1 SYCL devices: llm_load_tensors: ggml ctx size = 0.27 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: SYCL0 buffer size = 4156.00 MiB llm_load_tensors: SYCL_Host buffer size = 281.81 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 [SYCL] call ggml_check_sycl ggml_check_sycl: GGML_SYCL_DEBUG: 0 ggml_check_sycl: GGML_SYCL_F16: no found 1 SYCL devices: Max Max Global compute Max work sub mem ID Device Type Name Version units group group size Driver version
0 [level_zero:gpu:0] Intel Iris Xe Graphics 1.5 80 512 32 30843M 1.3.30872

llama_kv_cache_init: SYCL0 KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: SYCL_Host output buffer size = 2.02 MiB [1727278109] warming up the model with an empty run llama_new_context_with_model: SYCL0 compute buffer size = 576.00 MiB llama_new_context_with_model: SYCL_Host compute buffer size = 24.01 MiB llama_new_context_with_model: graph nodes = 1062 llama_new_context_with_model: graph splits = 2 ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:429: auto ggml_sycl_op_sdp_xmx_casual(fp16 , fp16 , fp16 , fp16 , fp16 , float , size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed. time=2024-09-25T15:28:29.632Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error" time=2024-09-25T15:28:29.883Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" [GIN] 2024/09/25 - 15:28:29 | 500 | 6.649310166s | 127.0.0.1 | POST "/api/generate"

`

lzivan commented 1 month ago

Hi, we are trying to reproduce your issue.

casperfrx commented 1 month ago

Running in the same issue here. open-webui:git-3ad003b used to work fine but after trying the latest releases, none of them work anymore. Reverting back to 3ad003b now also runs in sysctl issues. I can't figure out for the life of me what it could be.

lzivan commented 1 month ago

Hi @user7z , which version of oneApi are you using?

I'm testing fine on Linux, ipex-llm[cpp] == 2.2.0b20240925, oneApi 2024.0

image

user7z commented 1 month ago

@lzivan , i am using the latest ipex-llm , for oneapi i tried with 2024.1.0 & also 2024.2.1-1 Linux Kernel 6.10.10

lzivan commented 1 month ago

Hi @user7z , ipex-llm[cpp] currently supports oneAPI 2024.0 on Linux. You may use oneAPI 2024.0 and have a try again.

Here is the guide regarding installing oneAPI 2024.0 on Linux: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#install-oneapi

Feel free to point out if you still have any problems.

user7z commented 1 month ago

Hi @lzivan, i install the intel-oneapi-basekit 2024.0.0.49564-3 , since my package manager is not apt , i was lucky the distro has an archive for old packages, however it gives me the same core dump , here is our distrbution packaging_script.

Is it strictlly needed to be on a debian based distrbution to accelerate ollama on an intel platform ? I think that the pkg manager shouldnt be a probleme & if packages needs to be in a certain way let me know plz , so we could modify what can be. Also is there a known version of llm-cpp know for it to be working , i think that necessairy to identify the source of the issue here.

lzivan commented 1 month ago

Hi @user7z , regarding the installation of intel oneapi, we have some more installation methods.

Follow this link: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Overview/install_gpu.md#linux

Under step 2, you can see there are also PIP installer and Offline installer. You can have a try to install your oneApi.

For the llm-cpp version, I'm still trying to test it out. Will get back to you once I figured it out.

user7z commented 1 month ago

@lzivan is there any progress ?

lzivan commented 1 month ago

Hi @user7z , did you rechange your oneApi based on my previous link?

We successfully ran this on our Arc Linux machine, will try to reproduce it on our Iris Linux machine.

user7z commented 1 month ago

@lzivan yes i think its obvious , oneapi isnot the probleme , were toaking here about iris xe machine as i mention

lzivan commented 1 month ago

Hi @user7z , sorry for the late response. We have reproduced this issue and will fix it soon. You can try to use ipex-llm version 2.2.0b20240917 first through pip install --pre --upgrade ipex-llm[cpp]==2.2.0b20240917 and init-ollama again. It's the latest version we tested on our Iris Linux machine that won't cause a core-dump problem.

We will also update here when this issue is fixed :)

Oscilloscope98 commented 1 month ago

Hi @user7z,

We have resolved this issue :) You could have a try on the latest ipex-llm[cpp] through pip install --pre --upgrade ipex-llm[cpp] and init-ollama again.

You could also refer to our QuickStart for more information regarding Ollama with ipex-llm acceleration on Intel GPU.

Please let us know for any further problems :)

user7z commented 1 month ago

thank you @lzivan @Oscilloscope98 , i know have it working under Arch Linux using alder lake integrated graphics , the results are very promising . thank you , and if you can plz offer us a systemd service script with the guide , so one wouldnt need to each time run a script , thank you again

user7z commented 1 month ago

update : using it with llama3.2-1b gives very amazing results in conjunction with a browser extension for ollama page assiste, with a systemd service ,it faster than chatgpt, i am just like using chatgpt without privacy concerns and localy , its dramatically fast with 1b models , i dont recommend for higher , i think the cause is the memory , for lunar lake with more than 8000 MT/s it well give more performance , for me i have all what i need for a day to day task , thank you guys here for making this hapen , you make those selicon devices more useful , thank you all

moophlo commented 1 month ago

I'm using iGPU on Intel Core i9 9900K and I'm getting this error in the container console (running on kubernetes pod):

ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_kernel.cpp:343: void sdp_fp16_casual_kernel(const void *, const void *, const void *, void *, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, float *, float, sycl::queue &) [GS = 32, HD = 64]: Assertion `(context_length-seq_len)%GS==0 && "ubatch must be set as the times of GS\n"' failed.

On the host from syslog I'm getting:

systemd-coredump[34488]: Process 33194 (ollama_llama_se) of user 0 dumped core.#012#012Module /tmp/ollama2041968349/runners/cpu_avx2/ollama_llama_server without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/ollama_llama_server#012Module /opt/intel/oneapi/compiler/2024.2/lib/libcommon_clang.so.2024.18.7.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.2/lib/libcommon_clang.so.2024.18.7.0#012Module /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so.2024.18.7.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so.2024.18.7.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libonnxruntime.1.12.22.721.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libonnxruntime.1.12.22.721.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libcommon_clang.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libcommon_clang.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl_emu.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl_emu.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_adapter_level_zero.so.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_adapter_level_zero.so.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_loader.so.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_loader.so.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7.0.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7.0.0#012Module /opt/intel/oneapi/dnnl/2024.0/lib/libdnnl.so.3.3 without build-id.#012Module /opt/intel/oneapi/dnnl/2024.0/lib/libdnnl.so.3.3#012Module /opt/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so.4 without build-id.#012Module /opt/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so.4#012Module /opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libOpenCL.so.1 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libOpenCL.so.1#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintlc.so.5 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintlc.so.5#012Module /opt/intel/oneapi/compiler/2024.0/lib/libimf.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libimf.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsvml.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsvml.so#012Module /tmp/ollama2041968349/runners/cpu_avx2/libggml.so without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/libggml.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libirng.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libirng.so#012Module /tmp/ollama2041968349/runners/cpu_avx2/libllama.so without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/libllama.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_unified_runtime.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_unified_runtime.so#012Stack trace of thread 67:#012#0  0x000079a68086d9fc n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x969fc)#012ELF object binary architecture: AMD x86-64
sgwhat commented 1 month ago

Hi @moophlo , could you please open a new issue for us? We will keep to track your issue then.