Closed RonanKMcGovern closed 1 year ago
It looks like M1 CPU doesn't support AVX/AVX2.
Can you please download the following precompiled libs:
and then use them as:
from gpt4allj import Model, load_library
lib = load_library('/path/to/libgptj.dylib', '/path/to/libggml.dylib')
model = Model('/path/to/ggml-gpt4all-j.bin', lib=lib)
Please let me know if this works.
Much appreciated! This worked.
Would you have a recommendation for the kind of server to run this on for high performance (if I deploy on AWS or similar)? Thanks
Thanks for checking. I will add these precompiled libs to the package in the next release.
I'm not sure which server would be best but ideally the CPU should support AVX2 instructions and should have enough RAM to be able to load the entire model in memory (8GB RAM for gpt4all-j model).
I'll mark this as closed. Many thanks for your help.
BTW, do you know if a GPU version will be released so GPT4ALL-J could be run on a GPU?
The precompiled libs are released in the latest version 0.2.2 so the following should work on Apple silicon without additional configuration:
from gpt4allj import Model
model = Model('/path/to/ggml-gpt4all-j.bin')
For GPU, I haven't tried it myself but the underlying C++ library has some support for Nvidia GPU. It requires installing the cuBLAS library and building the ggml C++ library from source. AFAIK it is only used for processing input prompt but not for generating response. Full GPU support might not happen soon. See https://github.com/ggerganov/llama.cpp/discussions/915.
I'm using Mac with M1 chip. Any other tips? I already tried