When I load Qwen Coder 2.5 32B Q4 MLX (8k context) it uses about 17.3GB of RAM. After a while it's consuming over 40GB. LM Studio doesn't report the memory usage ever going down until I eject the model and reload it. Then it goes back to 17GB.
This is happening on LMStudio 0.3.5 the v0.0.14 of the mlx runtime on an M4 Max MacBook Pro.
Please let me know if there is anything else you need from me to debug this issue.
When I load Qwen Coder 2.5 32B Q4 MLX (8k context) it uses about 17.3GB of RAM. After a while it's consuming over 40GB. LM Studio doesn't report the memory usage ever going down until I eject the model and reload it. Then it goes back to 17GB.
This is happening on LMStudio 0.3.5 the v0.0.14 of the mlx runtime on an M4 Max MacBook Pro.
Please let me know if there is anything else you need from me to debug this issue.