Closed JFronny closed 1 year ago
This seems to be because of new changes in llama.cpp: I tried a build at https://github.com/ggerganov/llama.cpp/commit/feea179e9f9921e96e8fb1b8855d4a8f83682455 and that worked fine.
The previous version 1.1.2 was compatible with llama.cpp #b1204. I just released 1.1.3 which is now compatible with #b1256 and hopefully solves your issue. I will also very soon release version 2.0 of the binding which removes those compatibility problems.
As you said, it seems to work now. Thanks!
The example from the README seems to segfault whenever it tries to tokenize input (my prompt is
How are you?
). I have not modified the code except for setting NGpuLayers to 30 due to it previously running out of VRAM and changing the modelPath to my local path (file downloaded fromTheBloke
). I am usingde.kherud:llama:1.1.2
. The hserr*.log is attached. llama.cpp was built with:hs_err_pid43263.log
Trying the same prompt and model with the
server
binary built by llama.cpp works fine btw, so I don't think that is the issue.