mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.64k stars 1.81k forks source link

LLama backend is broken #3198

Open highghlow opened 1 month ago

highghlow commented 1 month ago

LocalAI version: latest-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

$ uname -a
Linux server 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

$ lsmod | grep nvidia
nvidia_uvm           1540096  0
nvidia_drm             77824  0                      
drm_kms_helper        208896  1 nvidia_drm                                  
nvidia_modeset       1314816  2 nvidia_drm                                  
video                  65536  1 nvidia_modeset                              
nvidia              56778752  19 
nvidia_uvm,nvidia_modeset                  drm                   
614400  4 drm_kms_helper,nvidia,nvidia_drm

docker-compose.yml

Describe the bug

All models with llama.cpp as the backend just don't work

To Reproduce

  1. Replicate my setup
  2. Chat with pre-installed llava from the webui
  3. See nothing in the webui
  4. See weird stuff in logs

Expected behavior

I should've received a response

Logs

Here's localai running from start to finish (with me running llava from webui) localai-log.txt

Additional context

I have wiped /models and ran localai once before recording the log

From what I see, the model successfully loads in llama.cpp, but localai doesn't recognize this and tryes to use a bunch of other backend, untimately arriving at stablediffusion

xxfogs commented 2 days ago

Any updates?