monatis / clip.cpp

CLIP inference in plain C/C++ with no extra dependencies
MIT License
446 stars 30 forks source link

not enough space in the context's memory pool (on Apple M1 Max, 32GB RAM, clip-vit-b-32) #33

Open dukeeagle opened 1 year ago

dukeeagle commented 1 year ago

Hi there,

Thank you so much for making this library. I'm unfortunately running into the following error

./main --model '/Users/lucasigel/Downloads/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_0.bin'  --text "test" --image '/00000002.jpg' -v 1

clip_model_load: loading model from '/Users/lucasigel/Downloads/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_0.bin' - please wait....................................................clip_model_load: model size =    85.06 MB / num tensors = 397
clip_model_load: model loaded

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12051936, available 8388608)
Assertion failed: (false), function ggml_new_tensor_impl, file ggml.c, line 4449.

zsh: abort      ./main --model  --text "test" --image  -v 1

I'm running on a Mac Studio with M1 Max and 32 GB of RAM. I tried every available model binary on huggingface and still got the same memory pool error. Is this due to a memory allocation bug? I see in #17 that this got solved for some cases and I'm wondering if there are lingering issues here

dukeeagle commented 1 year ago

Barely missing the threshold on openai_clip-vit-base-patch16.ggmlv0.f16.bin! Can we significantly reduce the minimum memory pool size? Is this just a bug that's massively inflating the minimum? I'd like to run clip.cpp on far less powerful devices than my Mac Studio if that's possible

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 17471536, available 16777216)
Assertion failed: (false), function ggml_new_tensor_impl, file ggml.c, line 4449.
zsh: abort      ./main --model  --text "test" --image  -v 1
monatis commented 1 year ago

It requests ~12 mb instead of 8mb that I set as a fixed value here. https://github.com/monatis/clip.cpp/blob/e2eee8e9b11afe4fc9fdb22d1f6d0ea53df9552a/clip.cpp#L24-L30

You can slightly increase them --8 is for patch32 and 16 is for patch16, so adjust them to a value that is working for you. Interestingly, It works for me with these values on Windows and Linux but haven't tried with Macbook yet. Additionally, quantized models may require slightly more memory. I'll try to replicate it tomorrow.

I'd like to run clip.cpp on far less powerful devices

What kind of devices are you targeting? I'm quite interested in new use cases and low-end devices, so we can work on it anyway

dukeeagle commented 1 year ago

That worked! Thank you so much for the quick reply.

I want to run on Intel-era Macbook Pros and Airs, like a 13" Macbook Air 2019. Not very low-end in the grand scheme of things haha

dukeeagle commented 1 year ago

As an aside, have you tried converting these models to CoreML like they do in whisper.cpp?

monatis commented 1 year ago

That worked!

That's great! I'll try to find the root cause of this difference and patch it later on.

Not very low-end in the grand scheme of things

Hahha yes. They should do a fairly good job.

have you tried converting these models to CoreML

Not yet, but good point. I'd like to support additional deployment types as we find different use cases for clip.cpp.

dukeeagle commented 1 year ago

Really appreciate the quick replies here. Have you also considered building out a version of this for BLIP or other more recent CLIP variants? Currently exploring the steps involved. Large-scale image retrieval has worked far better on BLIP and BLIP2 but of course they take way more time and memory