I want to run the tinyllama model and I wonder if there a way to run GGUF models with this crate. It seem much more common that models are using the GGUF format over the GGML format for models and converting from GGUF -> GGML seem non-tivial. At least from my limited skillset.
Hi the underlying code is wrapped around llama.cpp. Check the requirements from that codebase. If we need to upgrade the llama.cpp code, a pr would be appreciated.
I want to run the tinyllama model and I wonder if there a way to run GGUF models with this crate. It seem much more common that models are using the GGUF format over the GGML format for models and converting from GGUF -> GGML seem non-tivial. At least from my limited skillset.