Closed RFYoung closed 4 months ago
Here is the log file: hs_err_pid20857.log
Damn, I didn't test thoroughly enough with CUDA, but I can reproduce the problems, thanks for reporting. It seems to be related to input_prefix
and input_suffix
being set, but I didn't find the reason yet. The strings are correctly transferred to C++ and tokenized equivalently. I think I tracked the segmentation fault down to happening at this line https://github.com/kherud/java-llama.cpp/blob/6d500b5d654ce0a0fe8bced81d8f47b0de04fef8/src/main/cpp/server.hpp#L2266
It turns out this is a bug in llama.cpp after all and I've created an issue there (see https://github.com/ggerganov/llama.cpp/issues/6672)
It didn't produce a crash for you because the /infill
endpoint has to be used instead of /completion
.
The problem only seems to occur with models that don't support infilling, which unfortunately is the case for the model used in the unit tests. However, everything works correctly with models that support infilling (e.g. codellama).
I changed the model that is used for testing to codellama, so there shouldn't be a segmentation fault anymore. However, I'm still leaving this issue open until the underlying issue is fixed within llama.cpp.
I think I've diagnosed the issue & pointed to the tag that fixed it in the related thread. https://github.com/ggerganov/llama.cpp/issues/6672
Hello!
I really appreciate that you have upgraded this project!
However, there are still 2 tests that cannot pass.
testGenerateInfill
andtestCompleteInfillCustom
. The outputs would be something like this:I have built with the command
cmake .. -DBUILD_SHARED_LIBS=ON -DLLAMA_CUDA=ON -DLLAMA_CURL=ON
.Also, I have tested vanilla llama.cpp of tag b2619, with the same build args above and the same inference args (shown below), and it worked without crash:
and
Anyway, other java tests have been passed.
Thanks!