Closed FireMasterK closed 1 year ago
you can already specify the gpu layers in the YAML model config file with gpu_layers: ...
, would that cover it?
I think that would cover it indeed!
When I try using it, I get an panic error:
My configuration is:
name: WizardLM-7B-uncensored.ggml-gpu
parameters:
model: WizardLM-7B-uncensored.ggmlv3.q2_K
backend: llama
gpu_layers: 32
See my walk through in #504. It's for kubernetes but is easily translatable to other methods.
Closing in favor of #556, fix is on its way in https://github.com/go-skynet/LocalAI/pull/597
Is your feature request related to a problem? Please describe. Despite building with cuBLAS,
LocalAI
still uses only my CPU by the looks of it.Describe the solution you'd like Usage of the GPU for inferencing.
Describe alternatives you've considered N/A / unaware of any alternatives.
Additional context See https://github.com/ggerganov/llama.cpp/issues/1448