Can now select a gguf / llamafile model from the webui and set various settings for llamafile before loading the model.
By default it runs on port 8080 and with 0 gpu layers offloaded, so entirely CPU. It offers the option to enable GPU processing along with the option to set the total layer amount for processing.
Unfortunately the 'Stop Llamafile' button doesn't work, and so you have to manually close out the terminal window that was spawned to kill it.
Next up is the same for ollama and HuggingFace transformers.
Can now select a gguf / llamafile model from the webui and set various settings for llamafile before loading the model. By default it runs on port 8080 and with 0 gpu layers offloaded, so entirely CPU. It offers the option to enable GPU processing along with the option to set the total layer amount for processing.
Unfortunately the 'Stop Llamafile' button doesn't work, and so you have to manually close out the terminal window that was spawned to kill it.
Next up is the same for ollama and HuggingFace transformers.