Open jasonw247 opened 9 months ago
Is llama.cpp actually using Metal? I tried this and noticed (only after enabling some debug logging) that in fact the file ggml-metal.metal
could not be found (it needs to be placed in the current working directory). After this the basic
example works just fine for me (and actually uses the GPU) with a Mixtral GGUF model.
I copied over the necessary metal files, otherwise I would get an error. After copying the files I encountered the no generated tokens issue.
Is llama.cpp actually using Metal? I tried this and noticed (only after enabling some debug logging) that in fact the file
ggml-metal.metal
could not be found (it needs to be placed in the current working directory). After this thebasic
example works just fine for me (and actually uses the GPU) with a Mixtral GGUF model.
AFAIK it does: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#metal-build
llama-cpp-python requires the user to specify CMAKE_ARGS
when during pip install
: https://llama-cpp-python.readthedocs.io/en/latest/install/macos/
Do users need to do something similar during cargo install
for this crate?
Reading through here, it seems like llama.cpp needs to be built with specific flags in order for metal support to work: https://github.com/ggerganov/llama.cpp/pull/1642
I'm running the example script with a few different models:
When not using metal (not using
n_gpu_layers
) the models generate tokens ex:When I use
n_gpu_layers
it does not generate tokens, ex:Is this a known behavior?