I didn't see any way to set the number of layers passed to the gpu for llama. I don't know rust so this might be the wrong way to do things.
After setting the enabling the cuda feature for llm-chain-llama-sys then setting NumGpuLayers option to any number above 0 cuda acceleration works perfectly for me and llama models run like 5x faster at 20 layers.
It doesn't seem to break anything as setting the option with cuda disabled just has no effect.
I didn't see any way to set the number of layers passed to the gpu for llama. I don't know rust so this might be the wrong way to do things.
After setting the enabling the
cuda
feature forllm-chain-llama-sys
then settingNumGpuLayers
option to any number above 0 cuda acceleration works perfectly for me and llama models run like 5x faster at 20 layers.It doesn't seem to break anything as setting the option with
cuda
disabled just has no effect.