After updating LMStudio to 0.2.29 it seems that Codestral 22B v0.1 Q4K does not work anymore with large context lengths.
With a context length of 8192, Codestral works fine and LM Studio uses 100% of the GPU.
Increasing the context length to 16384 leads to only 40-50% GPU usage and nonsense token generation.
I was able to reproduce the same behaviour with the latest update of GPT4all.
Could this be a bug introduced in recent llama.cpp builds?
Attached are screenshots with examples of nonsense outputs and the inference parameters I used both with LM Studio and GPT4all. All runs were done on a Macbook Air M2 with 24GB RAM under macOS Sonoma 14.5.
After updating LMStudio to 0.2.29 it seems that Codestral 22B v0.1 Q4K does not work anymore with large context lengths. With a context length of 8192, Codestral works fine and LM Studio uses 100% of the GPU. Increasing the context length to 16384 leads to only 40-50% GPU usage and nonsense token generation.
I was able to reproduce the same behaviour with the latest update of GPT4all. Could this be a bug introduced in recent llama.cpp builds?
Attached are screenshots with examples of nonsense outputs and the inference parameters I used both with LM Studio and GPT4all. All runs were done on a Macbook Air M2 with 24GB RAM under macOS Sonoma 14.5.