marella / gpt4all-j

Python bindings for the C++ port of GPT4All-J model.
MIT License
38 stars 11 forks source link

zsh: illegal hardware instruction #7

Closed RonanKMcGovern closed 1 year ago

RonanKMcGovern commented 1 year ago

I'm using Mac with M1 chip. Any other tips? I already tried


from gpt4allj import Model

model = Model('./models/ggml-gpt4all-j.bin', instructions='avx')

print(model.generate('AI is going to'))```
marella commented 1 year ago

It looks like M1 CPU doesn't support AVX/AVX2.

Can you please download the following precompiled libs:

and then use them as:

from gpt4allj import Model, load_library

lib = load_library('/path/to/libgptj.dylib', '/path/to/libggml.dylib')

model = Model('/path/to/ggml-gpt4all-j.bin', lib=lib)

Please let me know if this works.

RonanKMcGovern commented 1 year ago

Much appreciated! This worked.

Would you have a recommendation for the kind of server to run this on for high performance (if I deploy on AWS or similar)? Thanks

marella commented 1 year ago

Thanks for checking. I will add these precompiled libs to the package in the next release.

I'm not sure which server would be best but ideally the CPU should support AVX2 instructions and should have enough RAM to be able to load the entire model in memory (8GB RAM for gpt4all-j model).

RonanKMcGovern commented 1 year ago

I'll mark this as closed. Many thanks for your help.

BTW, do you know if a GPU version will be released so GPT4ALL-J could be run on a GPU?

marella commented 1 year ago

The precompiled libs are released in the latest version 0.2.2 so the following should work on Apple silicon without additional configuration:

from gpt4allj import Model

model = Model('/path/to/ggml-gpt4all-j.bin')

For GPU, I haven't tried it myself but the underlying C++ library has some support for Nvidia GPU. It requires installing the cuBLAS library and building the ggml C++ library from source. AFAIK it is only used for processing input prompt but not for generating response. Full GPU support might not happen soon. See https://github.com/ggerganov/llama.cpp/discussions/915.