Closed user7z closed 1 month ago
Hi, we are trying to reproduce your issue.
Running in the same issue here. open-webui:git-3ad003b used to work fine but after trying the latest releases, none of them work anymore. Reverting back to 3ad003b now also runs in sysctl issues. I can't figure out for the life of me what it could be.
Hi @user7z , which version of oneApi are you using?
I'm testing fine on Linux, ipex-llm[cpp] == 2.2.0b20240925, oneApi 2024.0
@lzivan , i am using the latest ipex-llm , for oneapi i tried with 2024.1.0 & also 2024.2.1-1 Linux Kernel 6.10.10
Hi @user7z , ipex-llm[cpp]
currently supports oneAPI 2024.0 on Linux. You may use oneAPI 2024.0 and have a try again.
Here is the guide regarding installing oneAPI 2024.0 on Linux: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#install-oneapi
Feel free to point out if you still have any problems.
Hi @lzivan, i install the intel-oneapi-basekit 2024.0.0.49564-3 , since my package manager is not apt , i was lucky the distro has an archive for old packages, however it gives me the same core dump , here is our distrbution packaging_script.
Is it strictlly needed to be on a debian based distrbution to accelerate ollama on an intel platform ? I think that the pkg manager shouldnt be a probleme & if packages needs to be in a certain way let me know plz , so we could modify what can be. Also is there a known version of llm-cpp know for it to be working , i think that necessairy to identify the source of the issue here.
Hi @user7z , regarding the installation of intel oneapi, we have some more installation methods.
Follow this link: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Overview/install_gpu.md#linux
Under step 2, you can see there are also PIP installer
and Offline installer
. You can have a try to install your oneApi.
For the llm-cpp version, I'm still trying to test it out. Will get back to you once I figured it out.
@lzivan is there any progress ?
Hi @user7z , did you rechange your oneApi based on my previous link?
We successfully ran this on our Arc Linux machine, will try to reproduce it on our Iris Linux machine.
@lzivan yes i think its obvious , oneapi isnot the probleme , were toaking here about iris xe machine as i mention
Hi @user7z , sorry for the late response. We have reproduced this issue and will fix it soon. You can try to use ipex-llm version 2.2.0b20240917
first through pip install --pre --upgrade ipex-llm[cpp]==2.2.0b20240917
and init-ollama
again. It's the latest version we tested on our Iris Linux machine that won't cause a core-dump problem.
We will also update here when this issue is fixed :)
Hi @user7z,
We have resolved this issue :) You could have a try on the latest ipex-llm[cpp]
through pip install --pre --upgrade ipex-llm[cpp]
and init-ollama
again.
You could also refer to our QuickStart for more information regarding Ollama with ipex-llm
acceleration on Intel GPU.
Please let us know for any further problems :)
thank you @lzivan @Oscilloscope98 , i know have it working under Arch Linux using alder lake integrated graphics , the results are very promising . thank you , and if you can plz offer us a systemd service script with the guide , so one wouldnt need to each time run a script , thank you again
update : using it with llama3.2-1b gives very amazing results in conjunction with a browser extension for ollama page assiste, with a systemd service ,it faster than chatgpt, i am just like using chatgpt without privacy concerns and localy , its dramatically fast with 1b models , i dont recommend for higher , i think the cause is the memory , for lunar lake with more than 8000 MT/s it well give more performance , for me i have all what i need for a day to day task , thank you guys here for making this hapen , you make those selicon devices more useful , thank you all
I'm using iGPU on Intel Core i9 9900K and I'm getting this error in the container console (running on kubernetes pod):
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_kernel.cpp:343: void sdp_fp16_casual_kernel(const void *, const void *, const void *, void *, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, const size_t, float *, float, sycl::queue &) [GS = 32, HD = 64]: Assertion `(context_length-seq_len)%GS==0 && "ubatch must be set as the times of GS\n"' failed.
On the host from syslog I'm getting:
systemd-coredump[34488]: Process 33194 (ollama_llama_se) of user 0 dumped core.#012#012Module /tmp/ollama2041968349/runners/cpu_avx2/ollama_llama_server without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/ollama_llama_server#012Module /opt/intel/oneapi/compiler/2024.2/lib/libcommon_clang.so.2024.18.7.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.2/lib/libcommon_clang.so.2024.18.7.0#012Module /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so.2024.18.7.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so.2024.18.7.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libonnxruntime.1.12.22.721.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libonnxruntime.1.12.22.721.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libcommon_clang.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libcommon_clang.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl_emu.so.2023.16.12.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintelocl_emu.so.2023.16.12.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_adapter_level_zero.so.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_adapter_level_zero.so.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_loader.so.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libur_loader.so.0#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7.0.0 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7.0.0#012Module /opt/intel/oneapi/dnnl/2024.0/lib/libdnnl.so.3.3 without build-id.#012Module /opt/intel/oneapi/dnnl/2024.0/lib/libdnnl.so.3.3#012Module /opt/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so.4 without build-id.#012Module /opt/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so.4#012Module /opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libOpenCL.so.1 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libOpenCL.so.1#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintlc.so.5 without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libintlc.so.5#012Module /opt/intel/oneapi/compiler/2024.0/lib/libimf.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libimf.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsvml.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libsvml.so#012Module /tmp/ollama2041968349/runners/cpu_avx2/libggml.so without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/libggml.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libirng.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libirng.so#012Module /tmp/ollama2041968349/runners/cpu_avx2/libllama.so without build-id.#012Module /tmp/ollama2041968349/runners/cpu_avx2/libllama.so#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_unified_runtime.so without build-id.#012Module /opt/intel/oneapi/compiler/2024.0/lib/libpi_unified_runtime.so#012Stack trace of thread 67:#012#0 0x000079a68086d9fc n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x969fc)#012ELF object binary architecture: AMD x86-64
Hi @moophlo , could you please open a new issue for us? We will keep to track your issue then.
i start serving with this script :
bash
export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 source /opt/intel/oneapi/setvars.sh ./ollama serve
end
here is serve log :
`:: initializing oneAPI environment ... ollama-lunch: BASH_VERSION = 5.2.32(1)-release args: Using "$@" for setvars.sh arguments: :: advisor -- latest :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: tbb -- latest :: vtune -- latest :: oneAPI environment initialized ::
2024/09/25 15:24:58 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost: https://localhost: http://127.0.0.1 https://127.0.0.1 http://127.0.0.1: https://127.0.0.1: http://0.0.0.0 https://0.0.0.0 http://0.0.0.0: https://0.0.0.0: app:// file:// tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-09-25T15:24:58.093Z level=INFO source=images.go:753 msg="total blobs: 5" time=2024-09-25T15:24:58.093Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(Server).PullModelHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(Server).CreateModelHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(Server).PushModelHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(Server).CopyModelHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(Server).DeleteModelHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(Server).ShowModelHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(Server).ProcessHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(Server).ShowModelHandler-fm (6 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) time=2024-09-25T15:24:58.093Z level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11434 (version 0.3.6-ipexllm-20240925)" time=2024-09-25T15:24:58.099Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama719758347/runners time=2024-09-25T15:24:58.232Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]"`
wheny i try to : ollama run llama3.1:latest
i get am error with this log :
llama_kv_cache_init: SYCL0 KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: SYCL_Host output buffer size = 2.02 MiB [1727278109] warming up the model with an empty run llama_new_context_with_model: SYCL0 compute buffer size = 576.00 MiB llama_new_context_with_model: SYCL_Host compute buffer size = 24.01 MiB llama_new_context_with_model: graph nodes = 1062 llama_new_context_with_model: graph splits = 2 ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:429: auto ggml_sycl_op_sdp_xmx_casual(fp16 , fp16 , fp16 , fp16 , fp16 , float , size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed. time=2024-09-25T15:28:29.632Z level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error" time=2024-09-25T15:28:29.883Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" [GIN] 2024/09/25 - 15:28:29 | 500 | 6.649310166s | 127.0.0.1 | POST "/api/generate"
`