Open richiejp opened 2 months ago
It appears that your model loading process encountered an issue while trying to run on the CPU. The error message indicates a segmentation fault, which is a type of error that typically results from an attempt to access memory that doesn't belong to the process.
Possible reasons for this error could be hardware issues, conflicts between different processes for the CPU's resources, or incompatibilities between the software and the CPU architecture.
To troubleshoot the issue, you might want to try the following steps:
If you need further assistance, provide more details about your system'
Hi @richiejp thanks again for your time.
This seems an upstream (openvino) error.
The kernel log seems to be related to accessing a memory region not allowed. Like a null or invalid pointer.
The instruction at de4000
in libopenvino_intel_gpu_plugin.so
is
de4000: 48 8b 78 08 mov 0x8(%rax),%rdi
That is copying from one register to another.
All the above just to say that sadly I have no clue because is an error related to the underlying operating system plugin that is transparent to LocalAI.
May I ask you to test directly openvino inference outside of LocalAI? If you don't know how to do I have a quick and dirty gradio test app in the qa_gradio.py file (python requirements in requirements.txt ). It's not meant for distribution, I made it just to see if it was worth investing time on openvino but could be usefult to understand if it's LocalAI related or not.
Another check is to be sure to have the latest ARC driver from Intel
OK, I'll see if I can get it running outside LocalAI. As for drivers, I had some issues installing the out-of-tree driver, so that could take a while otherwise I have to wait for a kernel update from Ubuntu/Dell.
I'd like to understand if it's LocalAI specific or not, so that in case we can open an issue upstream.
Yup, it also core dumped: egfault at 1e ip 0000783592b24063 sp 00007833977ec280 error 4 in libopenvino_intel_gpu_plugin.so[783591f83000+de4000] likely on CPU 6 (core 12, socket 0) [ +0.000010] Code: ff e8 81 2c c7 ff 48 8b b5 38 ce ff ff 4c 89 ff 80 8d 37 ce ff ff 80 e8 6b 2c c7 ff 48 8b 85 c0 da ff ff 80 8d 3f ce ff ff 80 <80> 38 00 0f 85 fc 0c 00 00 48 8b 85 f8 da ff ff 80 38 00 74 67 48
Thank you, it's segfaulting at the same instruction address de4000
so definitely an upstream bug or enviroment issue.
Can you open an issue to https://github.com/openvinotoolkit/openvino linking it here so I can follow?
I'm sorry for not being of more help 😞 but I don't have the resources to investigate more.
C.C @fakezeta
LocalAI version:
quay.io/go-skynet/local-ai@sha256:4e4e427433285b056f32bfaa313ec0e75aeacb5b5c8c273953f9d2242fb55a60 This is still the version without the AUTO GPU changes. I'll try updating when I get chance.
Environment, CPU architecture, OS, and Version:
Same as #2208, but using just the Arc dGPU
Describe the bug
libopenvino_intel_gpu_plugin.so segfaults during inference. It seems to be when the number of tokens produced is above some amount because it tends to fail in the same place, but sometimes it succeeds as well. I don't know how many tokens are being produced or if it is related to the context size.
To Reproduce
Ask it to summarize the output of for e.g.
lscpu
or explain 50 lines of a Makefile.Expected behavior
Not to segfault.
Logs
From the kernel log
LocalAI log after a previous crash, hence why it is restarting the process:
Additional context
Similar requests succeed on iGPU.