LM Studio Fails to Load Meta-Llama-3-120B-Instruct-Q2_K.gguf Model

aaron13100 commented 4 months ago

Description: LM Studio (version 0.2.27) fails to load the model file Meta-Llama-3-120B-Instruct-Q2_K.gguf on my machine, resulting in an error. However, the same model file loads and works correctly with oobabooga/text-generation-webui.

System Information:

OS: macOS 14.5 (Sonoma) CPU: Apple M3 Max RAM: 128GB Model Card: Meta-Llama-3-120B-Instruct-GGUF LM Studio Version: 0.2.27 Error Details:

Error Message: json Copy code "cause": "(Exit code: 6). Please check settings and try loading the model again.", "suggestion": "", "data": { "memory": { "ram_capacity": "128.00 GB", "ram_unused": "10.00 GB" }, "gpu": { "gpu_names": [ "Apple Silicon" ], "vram_recommended_capacity": "96.00 GB", "vram_unused": "7.71 GB" }, "os": { "platform": "darwin", "version": "14.5" }, "app": { "version": "0.2.27", "downloadsDir": "/Users/user/.cache/lm-studio/models" }, "model": {} }, "title": "Error loading model." Reproduction Steps:

Attempt to load the model file Meta-Llama-3-120B-Instruct-Q2_K.gguf in LM Studio. Observe the error message with exit code 6. Load the same model file in oobabooga/text-generation-webui. Observe that the model loads successfully and functions correctly. Additional Information:

Many other models load and work fine in LM Studio. Quant size: Meta-Llama-3-120B-Instruct-Q2_K.gguf Expected Behavior: The Meta-Llama-3-120B-Instruct-Q2_K.gguf model should load successfully in LM Studio without any errors, similar to its performance in oobabooga/text-generation-webui.

Darkwing371 commented 4 months ago

You are on a MacBook, right? Without a dedicated 3D graphics card, right?

In LM Studio, are you trying to offload the model to GPU? When doing so, the model's size must be lower than the VRAM size, to be able to be contained there. Try setting GPU offload to 0 or disable it.

I suspect, that oobabooga/text-generation-webui disabled GPU offloading by default, that's why it works there out of the box.

Give it a try ... apart from that, I have no idea.

aaron13100 commented 4 months ago

Thanks for the message. I didn't even notice those settings before - I didn't realize you could scroll down below the system prompt. Turning off GPU offload wasn't the issue.

When I turned off "Keep entire model in RAM" though, I was able to query the model and get a response.

lmstudio-ai / lmstudio-bug-tracker

LM Studio Fails to Load Meta-Llama-3-120B-Instruct-Q2_K.gguf Model #59