malloc error "Unable to allocate" on 8GB RAM Mac

davidjoffe commented 9 months ago

I kept encountering the below error while trying the stable diffusion sample in mlx-examples on an 8GB M2 Mac Mini here. After some investigation (detailed here: https://github.com/ml-explore/mlx-examples/issues/21) I found changing one line of code in MetalAllocator::MetalAllocator() in mlx/backend/metal/allocator.cpp to a much higher limit seems to have fixed the problem (this 1.5 seems maybe a bit conservative for low-RAM Macs):

block_limit_(1.5 * device_->recommendedMaxWorkingSetSize()) {}' https://github.com/davidjoffe/mlx/blob/main/mlx/backend/metal/allocator.cpp

I made a fork with this change, and built from source to test. I'd like to submit a Pull Request. This change should help for low-RAM Macs like 8GB Macs, though effectively just allows it to use swap instead of failing - arguably better than failing, but in the long run this behavior may need further improvement/refining, and/or giving users more control over whether/how they want this, or perhaps warning, or something.

(foo) david@Davids-Mac-mini stable_diffusion % python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 1
/Users/david/mlx/foo/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
100%|
 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
zsh: abort      python txt2image.py "A photo of an astronaut riding a horse on Mars."  1  1
(foo) david@Davids-Mac-mini stable_diffusion % /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

saminatorkash commented 9 months ago

Completely I agree that we should be able to use as much swap memory as we please. same here. using a 8GB model. I haven't tried your solution of changing the value of 1.5 to more in allocation. I will play with it and let it update.

python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 2
diffusion_pytorch_model.safetensors: 100%|█| 3.46G/3.46G [09:10<00:00, 6.29MB/s]
text_encoder/config.json: 100%|█████████████████| 613/613 [00:00<00:00, 906kB/s]
model.safetensors: 100%|███████████████████| 1.36G/1.36G [04:41<00:00, 4.83MB/s]
vae/config.json: 100%|██████████████████████████| 553/553 [00:00<00:00, 947kB/s]
diffusion_pytorch_model.safetensors: 100%|███| 335M/335M [00:57<00:00, 5.81MB/s]
tokenizer/vocab.json: 100%|████████████████| 1.06M/1.06M [00:00<00:00, 1.18MB/s]
tokenizer/merges.txt: 100%|███████████████████| 525k/525k [00:00<00:00, 882kB/s]
100%|███████████████████████████████████████████| 50/50 [05:19<00:00,  6.39s/it]
  0%|                                                                                                                                 | 0/1 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
Abort trap: 6
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

swamyg commented 9 months ago

libc++abi: terminating due to uncaught exception of type std::runtime_error:
[malloc_or_wait] Unable to allocate 100237312 bytes.

I'm running into this error on my M3 Max, 36GB of ram when trying to run lora.py from ml-examples. Trying to fine tune mistral-7B model.

100237312 bytes doesn't seem that much, not sure why it's failing.

awni commented 9 months ago

See https://github.com/ml-explore/mlx-examples/issues/70 for some ideas around how to reduce Lora memory consumption until we have quantization.

s-smits commented 5 months ago

Also got a VRAM error while ~2GB more available than 9.8GB as shown in terminal when loading Phi-3. Is is possible to put the VRAM limit to max_available_at_initiating or something like that? So that other applications only take up swap.

awni commented 4 months ago

There is a maximum size you can allocate into a single buffer (which is a machine specific property). I think it is less than 9.8 GB for you.

But either way the fact that you are trying to put 9GB into a single buffer is not a good sign. What are you running to get that? Is it from training or generation?

s-smits commented 4 months ago

It is a 16GB Air M1, do you happen to know a ballpark of the limit? Or is it dynamically dependent of other processes? I was running a Phi-3-128k-mlx mlx_lm.utils load and generate function with ~6k context (when I run again it says 12.2GB needed), is it only limited to 8GB of VRAM? With PyTorch I am able to run 14GB of Python files without much of a speed loss (with around ~4-5GB swap of the top of my head).

awni commented 4 months ago

It is a 16GB Air M1, do you happen to know a ballpark of the limit?

I don't know but you could try running this until it breaks:

import mlx.core as mx

mx.metal.set_cache_limit(0)
for i in range(100):
    print(f"{i} GB")
    a = mx.zeros((2**30, i), mx.bool_)
    mx.eval(a)
    del a

awni commented 4 months ago

I'm going to close this issue as I'm not sure why it's still open. Feel free to file a new issue if you are still having issues with memory allocation.

s-smits commented 4 months ago


air@MacBook-Air-van-Air test-repo % /opt/homebrew/bin/python3.
10 /Users/air/Repositories/test-repo/test4.py
0 GB
1 GB
2 GB
3 GB
4 GB
5 GB
6 GB
7 GB
8 GB
9 GB
libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 9663676416 bytes.
zsh: abort      /opt/homebrew/bin/python3.10 ```
Just an FYI, no need for me to open a new issue, thank you.

ml-explore / mlx

malloc error "Unable to allocate" on 8GB RAM Mac #63