kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
279 stars 28 forks source link

Java tests failed when CUDA enabled on version 3.0.0 #54

Closed RFYoung closed 4 months ago

RFYoung commented 5 months ago

Hello!

I really appreciate that you have upgraded this project!

However, there are still 2 tests that cannot pass. testGenerateInfill and testCompleteInfillCustom. The outputs would be something like this:

{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"update_slots","line":1772,"msg":"all slots are idle"}
{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"launch_slot_with_task","line":1066,"msg":"slot is processing task","id_slot":0,"id_task":21}
{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"update_slots","line":2082,"msg":"kv cache rm [p0, end)","id_slot":0,"id_task":21,"p0":0}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000767f016b3d0f, pid=20857, tid=20912
#
# JRE version: OpenJDK Runtime Environment (22.0+36) (build 22+36)
# Java VM: OpenJDK 64-Bit Server VM (22+36, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libllama.so+0x125d0f]  dequantize_row_q4_K+0x4f
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/yys/java-llama.cpp/core.20857)
#
# An error report file with more information is saved as:
# /home/yys/java-llama.cpp/hs_err_pid20857.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

I have built with the command cmake .. -DBUILD_SHARED_LIBS=ON -DLLAMA_CUDA=ON -DLLAMA_CURL=ON.

Also, I have tested vanilla llama.cpp of tag b2619, with the same build args above and the same inference args (shown below), and it worked without crash:

./server -m PATH_TO_LLAMA_CHAT -ngl 43 --embeddings

and

curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{                                                                                                                                                                              
        "n_predict": 10,
        "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
        "logit_bias": [[2, 2.0]],
        "stop": ["\"\"\""],
        "seed": 42,
        "input_suffix": "\n    return result\n",
        "temperature": 0.95,
        "prompt": ""
}'

Anyway, other java tests have been passed.

Thanks!

RFYoung commented 5 months ago

Here is the log file: hs_err_pid20857.log

kherud commented 5 months ago

Damn, I didn't test thoroughly enough with CUDA, but I can reproduce the problems, thanks for reporting. It seems to be related to input_prefix and input_suffix being set, but I didn't find the reason yet. The strings are correctly transferred to C++ and tokenized equivalently. I think I tracked the segmentation fault down to happening at this line https://github.com/kherud/java-llama.cpp/blob/6d500b5d654ce0a0fe8bced81d8f47b0de04fef8/src/main/cpp/server.hpp#L2266

kherud commented 5 months ago

It turns out this is a bug in llama.cpp after all and I've created an issue there (see https://github.com/ggerganov/llama.cpp/issues/6672)

It didn't produce a crash for you because the /infill endpoint has to be used instead of /completion.

The problem only seems to occur with models that don't support infilling, which unfortunately is the case for the model used in the unit tests. However, everything works correctly with models that support infilling (e.g. codellama).

kherud commented 4 months ago

I changed the model that is used for testing to codellama, so there shouldn't be a segmentation fault anymore. However, I'm still leaving this issue open until the underlying issue is fixed within llama.cpp.

josh-ramer commented 4 months ago

I think I've diagnosed the issue & pointed to the tag that fixed it in the related thread. https://github.com/ggerganov/llama.cpp/issues/6672