lmstudio-ai / lmstudio-bug-tracker

Bug tracking for the LM Studio desktop application
2 stars 2 forks source link

LM Studio 0.2.22 running out of memory with context sizes larger than 56k (model supports 1024k) #14

Open thisIsLoading opened 1 month ago

thisIsLoading commented 1 month ago

When trying to utilize the full context size for this model https://huggingface.co/vsevolodl/Llama-3-70B-Instruct-Gradient-1048k-GGUF i get an out of RAM(?) error like this:

{
  "title": "Failed to load model",
  "cause": "",
  "errorData": {
    "n_ctx": 1048576,
    "n_batch": 512,
    "n_gpu_layers": 81
  },
  "data": {
    "memory": {
      "ram_capacity": "314.65 GB",
      "ram_unused": "316.65 KB"
    },
    "gpu": {
      "type": "NvidiaCuda",
      "vram_recommended_capacity": "141.90 GB",
      "vram_unused": "130.46 GB"
    },
    "os": {
      "platform": "linux",
      "version": "5.15.0-106-generic",
      "supports_avx2": true
    },
    "app": {
      "version": "0.2.22",
      "downloadsDir": "/home/loading/.cache/lm-studio/models"
    },
    "model": {}
  }
}

so, it claims that the ram is kinda used but when in fact htop only reports a 10gb RAM usage and LM Studio itself (at the top right) reports 48GB of RAM being used (although i believe, this might include the VRAM being used).

i try to fully offload to GPU.

i also noticed a bit of a slow down during the loading process. so it loads slower and slower until the above error pops up, but i dont know if this is as its supposed to be. maybe its just faking the progress bar, a little bit, and towards the end it realizes that there is still ways to go to load the rest of the model.

The model works with context sizes of up to 56k, everything larger ends with the above error.

i can use larger models than this with no issues (although they only have 8k context size). right now i tested https://huggingface.co/lmstudio-community/Meta-Llama-3-120B-Instruct-GGUF/ fully offloaded and it works like a charme (kinda. could run faster but its doing ok).