Running initLlama with lora / lora_scaled set, fails to load context and crashes app.

mybigday / llama.rn

React Native binding of llama.cpp

MIT License

323 stars 27 forks source link

Running initLlama with lora / lora_scaled set, fails to load context and crashes app. #86

Open tom-lewis-code opened 1 week ago

tom-lewis-code commented 1 week ago

I'm successfully managing to initLlama and inference on it, without lora:

    await initLlama({
        model: 'file://' + file.uri,
        n_ctx: 1024, // Adjust based on model requirements
        n_batch: 1, // Adjust based on device capabilities
        n_threads: 4, // Adjust based on device capabilities
        n_gpu_layers: 1, // Enable if device supports GPU acceleration
        use_mmap: true,
        use_mlock: true,
      })

But if I add lora / lora_scaled, it fails to load and crashes without erroring.

      lora: 'file://' + file.lora,
      lora_scaled: 0,

Any help would be greatly appreciated - running on Android. I'm loading the files in from assets/models then moving them to DocumentDirectoryPath and calling them from there. 🥸

jhen0409 commented 6 days ago

The file:// prefix is unnecessary.

For model it will remove the prefix, but there is no removal of lora. This can be added later. https://github.com/mybigday/llama.rn/blob/c1d15a30d6e8cc26dd9af144026c652008516d00/src/index.ts#L201

tom-lewis-code commented 6 days ago

Thanks for incredibly quick help, really appreciate it!

I've tried different ways of running initLlama e.g with/without the 'file://' extension on both model and lora. But I can't find a configuration of it that will work without closing the app, sorry if I'm missing anything obvious here.

In this example file.uri and file.lora are:

/data/user/0/com.rnllamaexample/files/my-lora.gguf
/data/user/0/com.rnllamaexample/files/my-model.gguf

This is my current setup for initLlama().

await initLlama({
    model: file.uri,
    lora: file.lora,
    lora_scaled: 1,
    n_ctx: 1024
    n_batch: 1, 
    n_threads: 4, 
    n_gpu_layers: 1,
    use_mmap: true,
    use_mlock: true,
  })

Thanks again!

jhen0409 commented 5 days ago

Tested with bartowski/Meta-Llama-3.1-8B-Instruct-GGUF as the base model and grimjim/Llama-3-Instruct-abliteration-LoRA-8B (converted) as the lora adapter, no issue on my Android device (Pixel 6).

Could you share what model & lora you are using? Also, Android hardware info that may be helpful.