nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
68.99k stars 7.57k forks source link

error loading model: missing tok_embeddings.weight #1098

Open sp-yang opened 1 year ago

sp-yang commented 1 year ago

System Info

gpt4all: 0.3.6 python: 3.9.16

Information

Related Components

Reproduction

import gpt4all falcon = gpt4all.GPT4All(model_name="ggml-model-gpt4all-falcon-q4_0.bin", model_path="./LLM/") messages = [{"role": "user", "content": "Name 3 colors"}] falcon.chat_completion(messages)

Expected behavior

load the model successfully, however I got the following message: error loading model: missing tok_embeddings.weight llama_init_from_file: failed to load model

This model can be loaded in gpt4all UI client successfully, but not in python. Is this a model issue or python code issue? Thanks.

lucianosilvi commented 1 year ago

I managed to load it with transformers, but running the generation this way is too slow compared to the gpt4all UI that is very fast.

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("nomic-ai/gpt4all-falcon", trust_remote_code=True)
sp-yang commented 1 year ago

@lucianosilvi Thanks for your reply. The model ggml-model-gpt4all-falcon-q4_0.bin, which was downloaded from https://gpt4all.io/, cannot be loaded in python bindings for gpt4all. Is there a way to load it in python and run faster?