Llama 3.1 8B model not working with Metal GPU (Mac M2)

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

https://nomic.ai/gpt4all

MIT License

69.17k stars 7.59k forks source link

Llama 3.1 8B model not working with Metal GPU (Mac M2) #2840

Open aeonoea opened 1 month ago

aeonoea commented 1 month ago

Bug Report

Hi all,

I receive gibberish when using the default install and settings of GPT4all and the latest 3.1 8B model on my M2 Mac mini. Other models seem to have no issues and they are using the GPU cores fully (can confirm with the app 'Stats').

Sorry if the issue is already open elsewhere, but I found nothing similar lately.

Steps to Reproduce

Install GPT4all
Download 3.1 8B model and leave everything at default (using Metal at default)
Receive gibberish

Expected Behavior

The same as with other models.

Your Environment

GPT4All version: 3.1.1
Operating System: MacOS 15.0 (public beta)
Chat model used (if applicable): Llama 3.1 8B

manyoso commented 1 month ago

I cannot reproduce this. I'm using GPT4All v3.1.1 with mac pro M2 and the llama 3.1 8B and it seems perfectly fine.

I'm on macOS Ventura 13.6 with Apple M2 Pro.

dhemasnurjaya commented 1 month ago

confirmed on mac m1 using metal

aeonoea commented 1 month ago

@dhemasnurjaya Which MacOS version?

I saw the recent gpt4all release mentioned a Metal fix, so was hoping it would work now but still the same unfortunately.

Also have tried a MacOS update to the latest 15.0 public beta. Maybe the developer beta or even the 15.1 betas would be compatible, but I am not feeling so experimental right now :)

aleneum commented 1 month ago

I use Gpt4All 3.2.1 on an Apple M1 Pro. Llama 3.1 works for me when I don't touch the context length in the settings (default 2048). It does work when I set the context length to 50,000 (and completely restart Gpt4All) but does not work when I try to set context length to 128,000, 100,000 or 99,000 (and completely restart Gpt4All). I'd assume the main issue is my (limited) memory. I read somewhere that 16GB free memory should be enough but looking at the memory pressure tells another story.