As of current, the plugin will use GPU acceleration, but only for a single layer. With n_gpu_layers=-1 llama.cpp will try to put the entire model onto the GPU.
(Side note: It took me ages to figure out why llm was so much slower than llama.cpp's ./main)
As of current, the plugin will use GPU acceleration, but only for a single layer. With
n_gpu_layers
=-1
llama.cpp will try to put the entire model onto the GPU.(Side note: It took me ages to figure out why llm was so much slower than llama.cpp's
./main
)