janhq / cortex.cpp

Local AI API Platform
https://cortex.so
Apache License 2.0
2.16k stars 130 forks source link

bug: GPU Performance Regression with Vulkan in v0.5.8 #1738

Open imtuyethan opened 2 weeks ago

imtuyethan commented 2 weeks ago

Jan version

0.5.8

Describe the Bug

https://discord.com/channels/1107178041848909847/1306758623325851689

GPU: AMD Radeon RX 6800 XT Driver: AMD proprietary driver (version 2.0.317) Vulkan API version: 1.3.292 Model tested: Mistral-7b Performance: ~25 tokens/sec (down from 60 tokens/sec in v0.5.7) GPU Layers: 37/37 layers offloaded to GPU (confirmed by cortex.log) Abnormal behavior: High CPU usage (~90%) despite GPU offloading

Performance regression observed with AMD GPUs (specifically RX 6800 XT) where GPU utilization is lower than expected and CPU usage is abnormally high (~90%) compared to previous versions. Despite GPU layers being offloaded correctly, the performance is significantly slower (25t/s vs previous 60t/s with Mistral-7b-v0.3).

Steps to Reproduce

  1. Install Jan v0.5.8 on system with AMD GPU
  2. Enable Vulkan support
  3. Load Mistral model with ngl set to 100
  4. Observe:
    • High CPU usage (~90%)
    • Lower tokens/sec compared to v0.5.7
    • GPU not fully utilized despite cortex.log showing layers offloaded

Additional Context Issue persists after factory reset

Screenshots / Logs

cortex.log app.log

What is your OS?

louis-jan commented 2 weeks ago

Qwen 32GB is a large model compared to this device specifications. Vulkan is also not very stable to achieve that speed in Vulkan mode. We need to reproduce the issue to ensure it is a bug.