:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
When testing distributed inferencing, i select a model (qwen 2.5 14b), send a chat message, the model loads on both instances (main and worker) and then the model does not respond and the model unloads on the worker. (watching with nvitop)
To Reproduce
description above should reproduce, i tried a few times.
LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-12 LocalAI version: v2.22.1 (015835dba2854572d50e167b7cade05af41ed214)
Environment, CPU architecture, OS, and Version:
Linux localai3 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux (Proxmox LXC, Debian. AMD EPYC 7302P (16 cores allocated)/64GB RAM
Describe the bug
When testing distributed inferencing, i select a model (qwen 2.5 14b), send a chat message, the model loads on both instances (main and worker) and then the model does not respond and the model unloads on the worker. (watching with nvitop)
To Reproduce
description above should reproduce, i tried a few times.
Expected behavior
model should not unload & chat should complete
Logs
worker logs
main logs
Additional context
this worked in the last version, though i'm not sure what that was at this point (~2 weeks ago) model loads and works fine without the worker.