qaiwiz commented 1 year ago

I am working on linux debian 11, and after pip install and downloading a most recent mode: gpt4all-lora-quantized-ggml.bin I have tried to test the example but I get the following error:

./gpt4all-lora-quantized-ggml.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74]) you most likely need to regenerate your ggml files the benefit is you'll get 10-100x faster load times see https://github.com/ggerganov/llama.cpp/issues/91 use convert-pth-to-ggml.py to regenerate from original pth use migrate-ggml-2023-03-30-pr613.py if you deleted originals llama_init_from_file: failed to load model

I tried this: pyllamacpp-convert-gpt4all ./gpt4all-lora-quantized-ggml.bin ./llama_tokenizer ./gpt4all-converted.bin but I am not sure where the tokenizer is stored!

abdeladim-s commented 1 year ago

@qaiwiz you should download the tokenizer as well (it's a small file), please see #5

qaiwiz commented 1 year ago

@abdeladim-s thanks, I just came to post that one has to download tokenizer as you pointed out (https://github.com/nomic-ai/pyllamacpp/issues/5). I actually did, but then I get File "/root/env39/bin/pyllamacpp-convert-gpt4all", line 8, in sys.exit(main()) File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert_gpt4all.py", line 19, in main convert_one_file(args.gpt4all_model, tokenizer) File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert.py", line 92, in convert_one_file write_header(f_out, read_header(f_in)) File "/root/env39/lib/python3.9/site-packages/pyllamacpp/scripts/convert.py", line 34, in write_header raise Exception('Invalid file magic. Must be an old style ggml file.')

What does it mean by old file! Actually downloaded the most recent model.bin file from that link ([gpt4all-lora-quantized-ggml.bin] 05-Apr-2023 13:07 4G). Now, I am wondering how should I fix this to get the model working.

qaiwiz commented 1 year ago

I couldn't fix it to work, so I redownload the converted model: https://huggingface.co/LLukas22/gpt4all-lora-quantized-ggjt. I am trying this on my server with 2 core and 8GB of ram (I know it is the limit), and i tried to bring down temperature and ease up some of the parameter, yet it is stalling! Typically how fast should I expect this to run on such server?

Load the model

model = Model(ggml_model="ggjt-model.bin", n_ctx=2000)

Generate

prompt="User: How are you doing?\nBot:"

result=model.generate(prompt,n_predict=50,temp=0, top_k = 3, top_p = 0.950000 ,repeat_last_n = 64, repeat_penalty = 1.100000)

is there any hyperparameter to fix it to work faster?

abdeladim-s commented 1 year ago

@qaiwiz the spec you are using is very low, you should have a quad core CPU at least. Also if the CPU you are using does not have AVX acceleration, it will be worse. You won't get much speed even if you changed the hyper-parameters.

qaiwiz commented 1 year ago

Here is the system config: system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: temp = 0.000000, top_k = 3, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 2000, n_batch = 8, n_predict = 50, n_keep = 0

qaiwiz commented 1 year ago

Here is the output:

llama_print_timings: load time = 71340.45 ms llama_print_timings: sample time = 299.64 ms / 55 runs ( 5.45 ms per run) llama_print_timings: prompt eval time = 292639.93 ms / 36 tokens ( 8128.89 ms per token) llama_print_timings: eval time = 2361021.55 ms / 52 runs (45404.26 ms per run) llama_print_timings: total time = 2812682.00 ms

result ' User: How are you doing?\nBot:\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01'

qaiwiz commented 1 year ago

I couldn't fix it to work, so I redownload the converted model: https://huggingface.co/LLukas22/gpt4all-lora-quantized-ggjt.

andzejsp commented 1 year ago

guys you borked the lama again?

Checking discussions database...
llama_model_load: loading model from './models/gpt4all-lora-quantized-ggml.bin' - please wait ...
./models/gpt4all-lora-quantized-ggml.bin: invalid model file (bad magic [got 0x6e756f46 want 0x67676a74])
        you most likely need to regenerate your ggml files
        the benefit is you'll get 10-100x faster load times
        see https://github.com/ggerganov/llama.cpp/issues/91
        use convert-pth-to-ggml.py to regenerate from original pth
        use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model
Chatbot created successfully
 * Serving Flask app 'GPT4All-WebUI'

Was working until i did a git pull today. So, whats going on? How do you convert to the right magic?, We (GPT4ALL-UI) just recently converted all models and uploaded to the hf but now they are dead...

Issue: https://github.com/nomic-ai/gpt4all-ui/issues/96

mahmoodfathy commented 1 year ago

@andzejsp am facing the same issue as well :/ , just tried it now with latest model and it doesn't work

andzejsp commented 1 year ago

@andzejsp am facing the same issue as well :/ , just tried it now with latest model and it doesn't work

in my case its working with ggml-vicuna-13b-4bit-rev1.bin model, not sure why the other model died...

mahmoodfathy commented 1 year ago

@andzejsp can you give me a download link to it if you have so i can try it ?

andzejsp commented 1 year ago

@andzejsp can you give me a download link to it if you have so i can try it ?

https://github.com/nomic-ai/gpt4all-ui#supported-models

abdeladim-s commented 1 year ago

@andzejsp We didn't touch anything, we didn't push any updates since a week now. You can take a look at the commits history. Please make sure you are doing the right thing!!

nomic-ai / pygpt4all

invalid model file (bad magic [got 0x67676d66 want 0x67676a74]) #58

Load the model

Generate