Open tkarna opened 2 months ago
hi, I think we have fix this in latest pr, may you try ipex-llm[cpp] >=2.2.0b20240924 tomorrow?
Thanks, I confirm that the simple example works now. However, when running a larger langchain agents workflow I'm still getting an error:
/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/llm/llama.cpp/ggml/src/ggml-backend.c:96: GGML_ASSERT(base != NULL && "backend buffer base cannot be NULL") failed
I'll see if I can make a small reproducer.
I still have this issue using Ollama and Open WebUI with llama3.1 as of 2.2.0b20240927.
ollama_llama_server: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:429: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, bool, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
time=2024-09-27T18:26:03.643-04:00 level=INFO source=server.go:629 msg="waiting for server to become available" status="llm server error"
time=2024-09-27T18:26:03.893-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"
After updating ipex-llm, running llama3.1 through langchain and ollama no longer works. A simple reproducer:
Last know working ipex-llm version is 2.2.0b20240826. Tested on Ubuntu 22.04, oneAPI 2024.02 (intel-basekit 2024.2.1-98) with two Intel(R) Data Center GPU Max 1100 GPUs.
Error message: