lmstudio-ai / lmstudio-bug-tracker

Bug tracking for the LM Studio desktop application
10 stars 3 forks source link

Codestral 22B v0.1 Q4K does not work with large context length (LM Studio 0.2.29) #68

Open benja0x40 opened 3 months ago

benja0x40 commented 3 months ago

After updating LMStudio to 0.2.29 it seems that Codestral 22B v0.1 Q4K does not work anymore with large context lengths. With a context length of 8192, Codestral works fine and LM Studio uses 100% of the GPU. Increasing the context length to 16384 leads to only 40-50% GPU usage and nonsense token generation.

I was able to reproduce the same behaviour with the latest update of GPT4all. Could this be a bug introduced in recent llama.cpp builds?

Attached are screenshots with examples of nonsense outputs and the inference parameters I used both with LM Studio and GPT4all. All runs were done on a Macbook Air M2 with 24GB RAM under macOS Sonoma 14.5.

Capture d’écran 2024-07-30 à 00 41 13 Capture d’écran 2024-07-30 à 00 42 02 Capture d’écran 2024-07-30 à 00 36 05 Capture d’écran 2024-07-30 à 00 42 31