LM Studio 0.2.22 running out of memory with context sizes larger than 56k (model supports 1024k)

thisIsLoading commented 6 months ago

When trying to utilize the full context size for this model https://huggingface.co/vsevolodl/Llama-3-70B-Instruct-Gradient-1048k-GGUF i get an out of RAM(?) error like this:

{
  "title": "Failed to load model",
  "cause": "",
  "errorData": {
    "n_ctx": 1048576,
    "n_batch": 512,
    "n_gpu_layers": 81
  },
  "data": {
    "memory": {
      "ram_capacity": "314.65 GB",
      "ram_unused": "316.65 KB"
    },
    "gpu": {
      "type": "NvidiaCuda",
      "vram_recommended_capacity": "141.90 GB",
      "vram_unused": "130.46 GB"
    },
    "os": {
      "platform": "linux",
      "version": "5.15.0-106-generic",
      "supports_avx2": true
    },
    "app": {
      "version": "0.2.22",
      "downloadsDir": "/home/loading/.cache/lm-studio/models"
    },
    "model": {}
  }
}

so, it claims that the ram is kinda used but when in fact htop only reports a 10gb RAM usage and LM Studio itself (at the top right) reports 48GB of RAM being used (although i believe, this might include the VRAM being used).

i try to fully offload to GPU.

i also noticed a bit of a slow down during the loading process. so it loads slower and slower until the above error pops up, but i dont know if this is as its supposed to be. maybe its just faking the progress bar, a little bit, and towards the end it realizes that there is still ways to go to load the rest of the model.

The model works with context sizes of up to 56k, everything larger ends with the above error.

i can use larger models than this with no issues (although they only have 8k context size). right now i tested https://huggingface.co/lmstudio-community/Meta-Llama-3-120B-Instruct-GGUF/ fully offloaded and it works like a charme (kinda. could run faster but its doing ok).

thisIsLoading commented 4 months ago

issues persists. just verified with latest version

5arer commented 2 weeks ago

I have the same with 0.3.5 and models like Phi-3-mini-128k-instruct. I'm not able to go beyond 30-40k while model supports up to 128k. Is there difference how this is calculated for LM studio?

lmstudio-ai / lmstudio-bug-tracker

LM Studio 0.2.22 running out of memory with context sizes larger than 56k (model supports 1024k) #14