guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.06k stars 64 forks source link

Entire Phone crashes whenever use metal is enabled #21

Closed rahulvk007 closed 5 months ago

rahulvk007 commented 7 months ago

I have an iPhone 13. I tried few models including llama 2 and orca.

Both immediately crashes the entire phone requiring a force restart if use metal is enabled. When use metal is disabled it will run but extremely slowly.

ShawnFumo commented 7 months ago

So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.

My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.

It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.

But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

guinmoon commented 7 months ago

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I sent the new version to testflight 3 days ago, apparently due to the upcoming holidays it takes more time to test it.

rahulvk007 commented 7 months ago

So, not sure if you have 13 or 13 pro. Makes a big difference since 13 has 4GB ram and 13 pro has 6gb.

My 14 pro has 6gb and what I've found works is a 7b model quantized to Q3 (K_M or K_S) like OpenHermes 2 Mistral quantized from TheBloke on huggingface. Then try Metal, MLock, and MMap all on and limit context to 1024 or 2048 to start and maybe try 3072.

It'll take a while to load the first time you type and may crash once or twice (especially at 3072) but should settle down. MLock especially forces everything into memory but makes the inference a lot faster.

But there's no way a 13b model will work or even a 7b at full 16 bit or even Q8. I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

Sadly mine is not 13 pro. It is 13 with 4gb ram. While I do believe that 4gb ram is a huge disadvantage but this particular issue doesn’t look like it is related to my ram as my entire phone is crashing(need to perform a hard restart)and only when metal is enabled.

I will try limiting the context and enabling mlock and nmap along with metal

guinmoon commented 7 months ago

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

davidmokos commented 7 months ago

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

rahulvk007 commented 7 months ago

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

Unfortunately I don't have any experience with ios development and ipa files. I think I will wait for the testflight version.

rahulvk007 commented 6 months ago

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

ShawnFumo commented 6 months ago

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

ShawnFumo commented 6 months ago

... I'm waiting for the new version of LLM Farm to get into TestFlight since it sounds like it supports the new 3b model from Stability. That should hopefully work with less quantization.

I've attached the ipa file to the release. If you know how to install it, you don't have to wait for the testflight version.

Appreciate you having the ipa up. I tried using one of those online services to host it with an install link but then ran into it having the same name as the TestFlight version. I'm sure I could have opened the file and edited it on my computer (or killed the TestFlight version), but just ended up waiting. I guess Apple just dropped the 0.8.0, but today I confirmed the 3b models like Rocket and Zephyr are working on 0.8.1.

I just realized, I probably could also have backed up the data folder, deleted the TestFlight version, installed ipa, and copied the data back in. Next time...

simon0117 commented 6 months ago

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.

rahulvk007 commented 6 months ago

Happening to me as well on 13 pro with Mistral 7B Instruct. However it crashes the phone regardless if the Metal option is on or off. Seems that MLock was causing the crash.

Is it working with other models for you ?

MLock I think tries to hold everything in ram, so it can really slow down the phone, but may improve the inference speed. Make sure very quantized and low context length and all other apps closed. Usually for me even if the phone seemingly froze, it'll get out of it eventually after a few mins when it can get ram settled again.

I have an iPhone 11 which has 4Gb RAM, same crashing issue.I turned off MLock and at least it responds now, although it took 221 seconds to respond to "hi". I don't know if it's just because it's the first time loading, or I have too many apps in the background, I'll have to experiment which still takes a long time, minutes to respond. Is there any way I can help troubleshoot this? I am a software developer by trade, but not an iOS developer sorry.

Are you using metal ?

Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)

simon0117 commented 6 months ago

Are you using metal ?

Because for me, I can run the models without crashing but at a very slow speed if I disable metal. But if I enable metal it will immediately crash. (iPhone 13 4GB ram)

Yes, I leave (left) Metal on. I tried turning it off just now (with MLock still disabled) and it seems faster, just taking a really long time as before. Maybe we need a different model/quant for us poor 4 giggers? What's the recommendation?

ShawnFumo commented 6 months ago

Maybe try one of the 3b models based on Stability? Like here is quantized versions of Rocket-3b: https://huggingface.co/TheBloke/rocket-3B-GGUF

I'd start with one of the smaller ones and see how that goes (seems like Q3_K_M may fit in 4gb ram). I found for me (14 Pro) it didn't seem to matter if I used MLock or not for speed (unlike with a 7b model), but I'd experiment both ways on yours.

rahulvk007 commented 6 months ago

I don't know what fixed it but now it is working perfectly fine with good performance. I haven't tried any 7B model. But all 3B models (I have tested Q4_KM versions)and they are working perfectly with metal enabled.

It was not working few days before but I recently updated to IOS 17.2.1 and now it seems to be fine.

sytelus commented 3 months ago

I unchecked MLock and it seems to be working now (MMap is checked) with Metal checked.