LLama backend is broken

LocalAI version: latest-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

$ uname -a
Linux server 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

$ lsmod | grep nvidia
nvidia_uvm           1540096  0
nvidia_drm             77824  0                      
drm_kms_helper        208896  1 nvidia_drm                                  
nvidia_modeset       1314816  2 nvidia_drm                                  
video                  65536  1 nvidia_modeset                              
nvidia              56778752  19 
nvidia_uvm,nvidia_modeset                  drm                   
614400  4 drm_kms_helper,nvidia,nvidia_drm

docker-compose.yml

Describe the bug

All models with llama.cpp as the backend just don't work

To Reproduce

Replicate my setup
Chat with pre-installed llava from the webui
See nothing in the webui
See weird stuff in logs

Expected behavior

I should've received a response

Logs

Here's localai running from start to finish (with me running llava from webui) localai-log.txt

Additional context

I have wiped /models and ran localai once before recording the log

From what I see, the model successfully loads in llama.cpp, but localai doesn't recognize this and tryes to use a bunch of other backend, untimately arriving at stablediffusion

mudler / LocalAI

LLama backend is broken #3198