Closed davidjoffe closed 4 months ago
Completely I agree that we should be able to use as much swap memory as we please. same here. using a 8GB model. I haven't tried your solution of changing the value of 1.5 to more in allocation. I will play with it and let it update.
python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 2
diffusion_pytorch_model.safetensors: 100%|█| 3.46G/3.46G [09:10<00:00, 6.29MB/s]
text_encoder/config.json: 100%|█████████████████| 613/613 [00:00<00:00, 906kB/s]
model.safetensors: 100%|███████████████████| 1.36G/1.36G [04:41<00:00, 4.83MB/s]
vae/config.json: 100%|██████████████████████████| 553/553 [00:00<00:00, 947kB/s]
diffusion_pytorch_model.safetensors: 100%|███| 335M/335M [00:57<00:00, 5.81MB/s]
tokenizer/vocab.json: 100%|████████████████| 1.06M/1.06M [00:00<00:00, 1.18MB/s]
tokenizer/merges.txt: 100%|███████████████████| 525k/525k [00:00<00:00, 882kB/s]
100%|███████████████████████████████████████████| 50/50 [05:19<00:00, 6.39s/it]
0%| | 0/1 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
Abort trap: 6
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
libc++abi: terminating due to uncaught exception of type std::runtime_error:
[malloc_or_wait] Unable to allocate 100237312 bytes.
I'm running into this error on my M3 Max, 36GB of ram when trying to run lora.py from ml-examples. Trying to fine tune mistral-7B model.
100237312 bytes
doesn't seem that much, not sure why it's failing.
See https://github.com/ml-explore/mlx-examples/issues/70 for some ideas around how to reduce Lora memory consumption until we have quantization.
Also got a VRAM error while ~2GB more available than 9.8GB as shown in terminal when loading Phi-3. Is is possible to put the VRAM limit to max_available_at_initiating
or something like that? So that other applications only take up swap.
There is a maximum size you can allocate into a single buffer (which is a machine specific property). I think it is less than 9.8 GB for you.
But either way the fact that you are trying to put 9GB into a single buffer is not a good sign. What are you running to get that? Is it from training or generation?
It is a 16GB Air M1, do you happen to know a ballpark of the limit? Or is it dynamically dependent of other processes? I was running a Phi-3-128k-mlx mlx_lm.utils load and generate function with ~6k context (when I run again it says 12.2GB needed), is it only limited to 8GB of VRAM? With PyTorch I am able to run 14GB of Python files without much of a speed loss (with around ~4-5GB swap of the top of my head).
It is a 16GB Air M1, do you happen to know a ballpark of the limit?
I don't know but you could try running this until it breaks:
import mlx.core as mx
mx.metal.set_cache_limit(0)
for i in range(100):
print(f"{i} GB")
a = mx.zeros((2**30, i), mx.bool_)
mx.eval(a)
del a
I'm going to close this issue as I'm not sure why it's still open. Feel free to file a new issue if you are still having issues with memory allocation.
air@MacBook-Air-van-Air test-repo % /opt/homebrew/bin/python3.
10 /Users/air/Repositories/test-repo/test4.py
0 GB
1 GB
2 GB
3 GB
4 GB
5 GB
6 GB
7 GB
8 GB
9 GB
libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 9663676416 bytes.
zsh: abort /opt/homebrew/bin/python3.10 ```
Just an FYI, no need for me to open a new issue, thank you.
I kept encountering the below error while trying the stable diffusion sample in mlx-examples on an 8GB M2 Mac Mini here. After some investigation (detailed here: https://github.com/ml-explore/mlx-examples/issues/21) I found changing one line of code in MetalAllocator::MetalAllocator() in mlx/backend/metal/allocator.cpp to a much higher limit seems to have fixed the problem (this 1.5 seems maybe a bit conservative for low-RAM Macs):
block_limit_(1.5 * device_->recommendedMaxWorkingSetSize()) {}'
https://github.com/davidjoffe/mlx/blob/main/mlx/backend/metal/allocator.cppI made a fork with this change, and built from source to test. I'd like to submit a Pull Request. This change should help for low-RAM Macs like 8GB Macs, though effectively just allows it to use swap instead of failing - arguably better than failing, but in the long run this behavior may need further improvement/refining, and/or giving users more control over whether/how they want this, or perhaps warning, or something.