Errors loading ggml models

NasonZ commented 1 year ago

Hello, I'm just starting to explore the models made available by gpt4all but I'm having trouble loading a few models.

My environment details:

Ubuntu==22.04
Python==3.10.10
pygpt4all==1.1.0.
llama-cpp-python==0.1.48

Code to reproduce error (_vicunatest.py):

from pygpt4all.models.gpt4all import GPT4All
model = GPT4All('./models/ggml-vicuna-7b-1.1-q4_2.bin') #or any of the other model

Issue:

The issue is that I can't seem to load some of the models listed here - https://github.com/nomic-ai/gpt4all-chat#manual-download-of-models. The models I've failed to load are:

ggml-gpt4all-j.bin
ggml-gpt4all-j-v1.3-groovy.bin
ggml-vicuna-7b-1.1-q4_2.bin
ggml-stable-vicuna-13B.q4_2.bin models.

As shown below, the ggml-gpt4all-l13b-snoozy.bin model loads without issue. I also managed to load this version - https://huggingface.co/mrgaang/aira/blob/main/gpt4all-converted.bin.

# Working example - ggml-gpt4all-l13b-snoozy.bin

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-gpt4all-l13b-snoozy.bin' - please wait ...
llama_model_load: n_vocab = 32000
...
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 9807.93 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from './models/ggml-gpt4all-l13b-snoozy.bin'
llama_model_load: model size =  7759.39 MB / num tensors = 363
llama_init_from_file: kv self size  =  800.00 MB
#loads without error

Errors loading the listed models:

# gpt4all-j-v1.3-groovy

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
llama_model_load: invalid model file './models/ggml-gpt4all-j-v1.3-groovy.bin' (too old, regenerate your model files or convert them with convert-unversioned-ggml-to-ggml.py!)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

# vicuna-7b

$ python vicuna_test.py 
llama_model_load: loading model from './models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
...
llama_model_load: type    = 1
llama_model_load: invalid model file './models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

#ggml-gpt4all-j.bin and ggml-stable-vicuna-13B.q4_2.bin models produced the same error

Can anyone please advise on how I resolve these issues?

NasonZ commented 1 year ago

I'm pretty sure the issue is with GPT4All as I can load all models mentioned with:

from llama_cpp import Llama
model_path = "models/ggml-vicuna-7b-1.1-q4_2.bin"
model = Llama(model_path=model_path)

Are there any major differences from loading the model through Llama instead of GPT4All?

385olt commented 1 year ago

I have the same issue.

My environment: ubuntu 22.04 python 3.10.6 pygpt4all 1.1.0 pygptj 2.0.3 pyllamacpp 2.1.3

Code: model = GPT4All('./ggml-mpt-7b-chat.bin', prompt_context = "The following is a conversation between Jim and Bob. Bob is trying to help Jim with his requests by answering the questions to the best of his abilities. If Bob cannot help Jim, then he says that he doesn't know.")

I can load ggml-gpt4all-l13b-snoozy.bin and https://huggingface.co/mrgaang/aira/blob/main/gpt4all-converted.bin and they work fine, but the following models fail to load: -- ggml-mpt-7b-chat.bin -- ggml-vicuna-7b-1.1-q4_2.bin -- ggml-wizardLM-7b.q4_2.bin

Loading ggml-wizardLM-7b.q4_2.bin and ggml-vicuna-7b-1.1-q4_2.bin gives

llama_model_load: invalid model file './ggml-wizardLM-7b.q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model

Loading ggml-mpt-7b-chat.bin gives

./ggml-mpt-7b-chat.bin: invalid model file (bad magic [got 0x67676d6d want 0x67676a74])
    you most likely need to regenerate your ggml files
    the benefit is you'll get 10-100x faster load times
    see https://github.com/ggerganov/llama.cpp/issues/91
    use convert-pth-to-ggml.py to regenerate from original pth
    use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model

I tried using llama.cpp/migrate-ggml-2023-03-30-pr613.py on ggml-mpt-7b-chat.bin, but got the error:

File "/.../llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 133, in read_tokens
    word = fin.read(length)
ValueError: read length must be non-negative or -1

I tried using llama.cpp/convert-unversioned-ggml-to-ggml.py to fix this error, but got the error:

File "/.../llama.cpp/convert-unversioned-ggml-to-ggml.py", line 29, in write_header
    raise Exception('Invalid file magic. Must be an old style ggml file.')
Exception: Invalid file magic. Must be an old style ggml file.

I tried using llama.cpp/migrate-ggml-2023-03-30-pr613.py on ggml-wizardLM-7b.q4_2.bin, but got the message: ./ggml-wizardLM-7b.q4_2.bin: input ggml has already been converted to 'ggjt' magic

I have no idea how to fix this or why it happens.

emilaz commented 1 year ago

I'm having the same issue on

Windows 10
Python 3.11.3
pygpt4all 1.1.0

and models gpt4all-lora-quantized.bin ggml-gpt4all-j-v1.3-groovy.bin

all result in

llama_model_load: invalid model file './gpt4all-lora-quantized.bin' (too old, regenerate your model files or convert them with convert-unversioned-ggml-to-ggml.py!)
llama_init_from_file: failed to load model

nomic-ai / pygpt4all

Errors loading ggml models #107