bug: GPU Performance Regression with Vulkan in v0.5.8

Jan version

0.5.8

Describe the Bug

https://discord.com/channels/1107178041848909847/1306758623325851689

GPU: AMD Radeon RX 6800 XT Driver: AMD proprietary driver (version 2.0.317) Vulkan API version: 1.3.292 Model tested: Mistral-7b Performance: ~25 tokens/sec (down from 60 tokens/sec in v0.5.7) GPU Layers: 37/37 layers offloaded to GPU (confirmed by cortex.log) Abnormal behavior: High CPU usage (~90%) despite GPU offloading

Performance regression observed with AMD GPUs (specifically RX 6800 XT) where GPU utilization is lower than expected and CPU usage is abnormally high (~90%) compared to previous versions. Despite GPU layers being offloaded correctly, the performance is significantly slower (25t/s vs previous 60t/s with Mistral-7b-v0.3).

Steps to Reproduce

Install Jan v0.5.8 on system with AMD GPU
Enable Vulkan support
Load Mistral model with ngl set to 100
Observe:
- High CPU usage (~90%)
- Lower tokens/sec compared to v0.5.7
- GPU not fully utilized despite cortex.log showing layers offloaded

Additional Context Issue persists after factory reset

Screenshots / Logs

cortex.log app.log

What is your OS?

[ ] MacOS
[X] Windows
[ ] Linux

janhq / cortex.cpp

bug: GPU Performance Regression with Vulkan in v0.5.8 #1738

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?