Opt for num_gpu_layers - Githubissues

I didn't see any way to set the number of layers passed to the gpu for llama. I don't know rust so this might be the wrong way to do things.

After setting the enabling the cuda feature for llm-chain-llama-sys then setting NumGpuLayers option to any number above 0 cuda acceleration works perfectly for me and llama models run like 5x faster at 20 layers.

It doesn't seem to break anything as setting the option with cuda disabled just has no effect.