kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
305 stars 32 forks source link

llama_init_from_gpt_params: error: failed to apply lora adapter #15

Closed mordesku closed 7 months ago

mordesku commented 1 year ago

Hi,

I compiled the library with CUDA support on Linux. There is an issue with passing the loraAdapter parameter.

My model parameters look like this:

       ModelParameters modelParams = new ModelParameters.Builder()
                .setNGpuLayers(36)
                .setLoraAdapter("./models/ggml-adapter-model.bin")
                .build();

But in logs, there is a null value:

2023-09-28T20:07:57.628+02:00  INFO 5896 --- [           main] LLAMA.CPP                                : llama_apply_lora_from_file_internal: applying lora adapter from '(null)' - please wait ...
2023-09-28T20:07:57.628+02:00  INFO 5896 --- [           main] LLAMA.CPP                                : llama_apply_lora_from_file_internal: failed to open '(null)'

Strangely, the same issue occurs when I'm not passing this parameter.

kherud commented 1 year ago

Thanks for reporting this. Honestly I have never worked with lora adapters, but I will look into this now.

Is the file you are using publicly available somewhere? Also, is it correctly working with the llama.cpp repository right now, since the file format changed from ggml to gguf?

mordesku commented 1 year ago

The problem with lora_adapter is that it is not empty according to .empty() method in cpp (but null?) even when I load the quantized gguf model into the GPU and not pass this parameter, which shouldn't use lora_adapter. The parameter was used in llama.cpp/common/common.cpp.

It seems they recently changed the code but up until couple minutes ago it was: https://github.com/ggerganov/llama.cpp/blob/a5661d7e71d15b8dfc81bc0510ba912ebe85dfa3/common/common.cpp#L765C1-L776C6

    if (!params.lora_adapter.empty()) {
        int err = llama_model_apply_lora_from_file(model,
                                             params.lora_adapter.c_str(),
                                             params.lora_base.empty() ? NULL : params.lora_base.c_str(),
                                             params.n_threads);
        if (err != 0) {
            fprintf(stderr, "%s: error: failed to apply lora adapter\n", __func__);
            llama_free(lctx);
            llama_free_model(model);
            return std::make_tuple(nullptr, nullptr);
        }
    }

Condition check !params.lora_adapter.empty() was true even when the parameter was not passed. So it seems the problem isn't the lora_adapter but the fact that we have a null there instead of an empty string? So maybe setting it to "" would solve the issue. Will try that tomorrow morning.

On second thought, it is not passed correctly as it prints null even if configured.

kkarski commented 1 year ago

I have a similar issue, I am passing NO lora adaters in my parameters and get an error message


....................................................................................................
llama_new_context_with_model: kv self size  =  800.00 MB
llama_new_context_with_model: compute buffer total size =   75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB
llama_apply_lora_from_file_internal: applying lora adapter from '(null)' - please wait ...
llama_apply_lora_from_file_internal: failed to open '(null)'
llama_init_from_gpt_params: error: failed to apply lora adapter
unable to load modelException in thread "main" de.kherud.llama.LlamaException: could not load model from given file path
    at de.kherud.llama.LlamaModel.loadModel(Native Method)
    at de.kherud.llama.LlamaModel.<init>(LlamaModel.java:54)```
kherud commented 7 months ago

I just released version 3.0 of the library and this problem should hopefully no longer occur. Feel free to re-open this issue if you still experience problems.