nomic-ai / pygpt4all

Official supported Python bindings for llama.cpp + gpt4all
https://nomic-ai.github.io/pygpt4all/
MIT License
1.02k stars 162 forks source link

can't mlock because it's not supported on this system #54

Closed Naugustogi closed 1 year ago

Naugustogi commented 1 year ago

Using Windows, on the other side llama.cpp works as fine with keeping the model in ram. Loading the model each time for use is annoying.

abdeladim-s commented 1 year ago

@Naugustogi, mlock should be supported, Do you get any errors!

Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ?

Naugustogi commented 1 year ago

@Naugustogi, mlock should be supported, Do you get any errors!

Btw, loading the model does not take much time! it is almost instant now! Why it that annoying ? 4566

5677

I'm using v 1.0.6 pyllamacpp. It crashed when i'm using use_mlock=True also using f16_kv=1

Also llama_print_timings: load time = 69042.31 ms llama_print_timings: sample time = 14.22 ms / 33 runs ( 0.43 ms per run) llama_print_timings: prompt eval time = 60306.69 ms / 108 tokens ( 558.40 ms per token) llama_print_timings: eval time = 18046.62 ms / 32 runs ( 563.96 ms per run) llama_print_timings: total time = 87104.24 ms

Which is way to slow i think. Model is gpt4 x alpaca 13b. Using 16gb ram, intel core i5 7400

works definitely faster if i'm using the base llama.cpp, i'm getting like 4 tokens/s.

abdeladim-s commented 1 year ago

@Naugustogi I think that error is coming from the ggml library. Everything is working normally on my side. Could you please try to build it from source ?

Naugustogi commented 1 year ago

@Naugustogi I think that error is coming from the ggml library. Everything is working normally on my side. Could you please try to build it from source ?

I am unable to rebuild and have to rely on other peoples upload. You can close this issue if you want. For now i have to wait for speed modifications, Model loading and staying in ram is fine, it just takes abit time in my case. The initial problem wasn't mlock. I simply mistook the loading time for the model generation time.

abdeladim-s commented 1 year ago

@Naugustogi Why you can't rebuild. if you succeeded to run llama.cpp then the process is straightforward, you only need cmake, and run pip install from the github repo!