kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
279 stars 28 forks source link

"Failed to load native library" when using custom llama.cpp with GPU acceleration build #53

Closed PauloIVM closed 5 months ago

PauloIVM commented 5 months ago

Hello. I'm trying to run this lib on an Ubuntu OS, using GPU acceleration with an Nvidia 1660 Super.

I was able to run the java-llama.cpp lib with a custom llama.cpp, when I built the llama.cpp like this:

mkdir build
cd build
cmake .. -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release

Naturally, as I didn't pass the argument "-DLLAMA_CUDA=ON" (or "-DLLAMA_CUBLAS=ON", I tried too with b1645 release), GPU acceleration was not used. But it worked correctly pointing to llama.cpp custom.

If I just rebuild llama.cpp according to the script below, then the code with the java-llama lib starts to break:

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON -DBUILD_SHARED_LIBS=ON
cmake --build . --config Release

I even tried to compile llama.cpp also with this release, which is the one used here by java-llama.cpp, and use -DLLAMA_CUBLAS=ON instead of DLLAMA_CUBLAS=ON. But the same error remains.

So this is my code. It's just a test code, very similar to the example in the jav-llama readme:

package org.example;
import de.kherud.llama.InferenceParameters;
import de.kherud.llama.LlamaModel;
import de.kherud.llama.ModelParameters;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;

public class Main {
    public static void main(String[] args) throws IOException {
        System.setProperty("de.kherud.llama.lib.path", "/home/pauloivm/Documentos/software-development/open-source-clones/llama.cpp/build");
        System.setProperty("java.library.path", "/home/pauloivm/Documentos/software-development/open-source-clones/llama.cpp/build");
        LlamaModel.setLogger((level, message) -> {});
        ModelParameters modelParams = new ModelParameters()
                .setMainGpu(1)
                .setNGpuLayers(20);
        InferenceParameters inferParams = new InferenceParameters()
                .setTemperature(0.7f)
                .setPenalizeNl(true)
                .setMirostat(InferenceParameters.MiroStat.V2);

        String modelPath = "/home/pauloivm/.cache/lm-studio/models/TheBloke/Wizard-Vicuna-7B-Uncensored-GGUF/Wizard-Vicuna-7B-Uncensored.Q4_K_M.gguf";
        String system = "This is a conversation between User and Llama, a friendly chatbot.\n" +
                "Llama is helpful, kind, honest, good at writing, and never fails to answer any " +
                "requests immediately and with precision.\n";
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
        try (LlamaModel model = new LlamaModel(modelPath, modelParams)) {
            System.out.print(system);
            String prompt = system;
            while (true) {
                prompt += "\nUser: ";
                System.out.print("\nUser: ");
                String input = reader.readLine();
                prompt += input;
                System.out.print("Llama: ");
                prompt += "\nLlama: ";
                for (LlamaModel.Output output : model.generate(prompt, inferParams)) {
                    System.out.print(output);
                    prompt += output;
                }
            }
        }
    }
}

And, when using a llama.cpp buited with -DLLAMA_CUBLAS=ON, I got this error below (and the code runs without my custom llama, using the default llama.cpp of this java-llama project, without GPU acceleration):

/home/pauloivm/.jdks/openjdk-22/bin/java -javaagent:/app/IDEA-C/lib/idea_rt.jar=45151:/app/IDEA-C/bin -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 -classpath /home/pauloivm/IdeaProjects/java-llm-exemple/target/classes:/home/pauloivm/.m2/repository/de/kherud/llama/2.3.5/llama-2.3.5.jar:/home/pauloivm/.m2/repository/org/jetbrains/annotations/24.0.1/annotations-24.0.1.jar org.example.Main
/home/pauloivm/Documentos/software-development/open-source-clones/llama.cpp/build/libllama.so: libcudart.so.11.0: não é possível abrir arquivo compartilhado: Arquivo ou diretório inexistente
Failed to load native library: /home/pauloivm/Documentos/software-development/open-source-clones/llama.cpp/build/libllama.so. osinfo: Linux/x86_64
Extracted 'libllama.so' to '/tmp/libllama.so'
Extracted 'libjllama.so' to '/tmp/libjllama.so'
This is a conversation between User and Llama, a friendly chatbot.
Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.

User: 
Process finished with exit code 130 (interrupted by signal 2:SIGINT)

If I simply rebuild llama.cpp without the -DLLAMA_CUBLAS=ON argument, this error disappears and the code starts running using my custom lib.

Any idea why this error occurs? And how to solve it?

kherud commented 5 months ago

Hi @PauloIVM I'll release version 3.0 of this binding very soon, which will probably fix this problem. If you want you can have an early look at the v3.0 branch. It's working on my Linux machine, I'm just struggling to get a CI GitHub workflow running, which I want to finish before merging. The API around ModelParameters and InferenceParameters slightly changed, have a look at src/test/java/examples in the branch.

PauloIVM commented 5 months ago

Thanks so much! I'll take a look at this branch :smile:

kherud commented 5 months ago

I just released version 3.0 which should hopefully solve your problems. Feel free to re-open if you still have issues.