simonw / llm-llama-cpp

LLM plugin for running models using llama.cpp
Apache License 2.0
136 stars 19 forks source link

default n_gpu_layers to -1 #31

Open AviKav opened 7 months ago

AviKav commented 7 months ago

As of current, the plugin will use GPU acceleration, but only for a single layer. With n_gpu_layers=-1 llama.cpp will try to put the entire model onto the GPU.

(Side note: It took me ages to figure out why llm was so much slower than llama.cpp's ./main)