Open thisIsLoading opened 6 months ago
issues persists. just verified with latest version
I have the same with 0.3.5 and models like Phi-3-mini-128k-instruct. I'm not able to go beyond 30-40k while model supports up to 128k. Is there difference how this is calculated for LM studio?
When trying to utilize the full context size for this model https://huggingface.co/vsevolodl/Llama-3-70B-Instruct-Gradient-1048k-GGUF i get an out of RAM(?) error like this:
so, it claims that the ram is kinda used but when in fact htop only reports a 10gb RAM usage and LM Studio itself (at the top right) reports 48GB of RAM being used (although i believe, this might include the VRAM being used).
i try to fully offload to GPU.
i also noticed a bit of a slow down during the loading process. so it loads slower and slower until the above error pops up, but i dont know if this is as its supposed to be. maybe its just faking the progress bar, a little bit, and towards the end it realizes that there is still ways to go to load the rest of the model.
The model works with context sizes of up to 56k, everything larger ends with the above error.
i can use larger models than this with no issues (although they only have 8k context size). right now i tested https://huggingface.co/lmstudio-community/Meta-Llama-3-120B-Instruct-GGUF/ fully offloaded and it works like a charme (kinda. could run faster but its doing ok).