Closed horvatm closed 1 year ago
@horvatm, the gpt4all binary is using a somehow old version of llama.cpp
so you might get different results with pyllamacpp
, have you tried using gpt4all
with the actual llama.cpp
binary
Can you PLEASE check on your side. The following code usually does not give me any results:
from pyllamacpp.model import Model
def new_text_callback(text: str):
print(text, end="", flush=True)
llama_config = { "n_ctx": 2048}
model = Model(ggml_model='gpt4all-lora-quantized-converted.bin', **llama_config)
# similar as in binary chat
gpt_config = {"n_predict": 128, "n_threads": 8, "repeat_last_n": 64,
"temp": 0.1, "top_k": 40, "top_p": 0.950000, "repeat_penalty": 1.3}
question = "Can you tell me, how did the Dutch obtain Manhattan, and what did it cost?"
model.generate(question, new_text_callback = new_text_callback, **gpt_config)
The results of the py-binding is
llama_model_load: loading model from 'gpt4all-lora-quantized-converted.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx = 2048
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from 'gpt4all-lora-quantized-converted.bin'
llama_model_load: model size = 4017.27 MB / num tensors = 291
llama_init_from_file: kv self size = 2048.00 MB
llama_generate: seed = 1681226838
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 2048, n_batch = 8, n_predict = 128, n_keep = 0
Can you tell me, how did the Dutch obtain Manhattan, and what did it cost? [end of text]
llama_print_timings: load time = 1957.01 ms
llama_print_timings: sample time = 0.60 ms / 1 runs ( 0.60 ms per run)
llama_print_timings: prompt eval time = 2502.97 ms / 20 tokens ( 125.15 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3492.68 ms
The binary version of chat or nomic-binding (although perhaps old) gives me this:
The Dutch obtained Manhattan from Native Americans in 1624 for beads worth $25 (approximately equivalent to about \$30 today).
The last answer is expected, but the results by python binding is not.
@horvatm
Can you try it with this:
question = "Can you tell me, how did the Dutch obtain Manhattan, and what did it cost?\n"
I am working on a new version to enable interactive mode by default, this will solve those issues. Please stay tuned.
Hi @horvatm,
Please try Interactive Dialogue from the readme page. I think this will solve the issue.
Please feel free top reopen the issue if it is not solved.
Hi.
How simular should be responses of the python binding and the compiled version of gpt4all at same seed and parameters. For example:
And the command
returns the result:
To what degree is this normal?