keldenl / gpt-llama.cpp

A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI.
MIT License
594 stars 66 forks source link

Slow speed Vicuna - 7B Help plz #45

Open C0deXG opened 1 year ago

C0deXG commented 1 year ago

when i ask a qestions it is soo slow it is taking forever to write one sentence how can i make it faster btw am using vicuna 7B to make it light wight for me and am using mac OS m2 chip and that doesnt even help :( so can i host the gpt-llama.cpp on render if so yes when i run sh ./scripts/test-installation.sh what should i put for the port and the locations of the file since am using render to render the model to make it faster ?

C0deXG commented 1 year ago

when i ask a qestions it is soo slow it is taking forever to write one sentence how can i make it faster btw am using vicuna 7B to make it light wight for me and am using mac OS m2 chip and that doesnt even help :( so can i host the gpt-llama.cpp on render if so yes when i run sh ./scripts/test-installation.sh what should i put for the port and the locations of the file since am using render to render the model to make it faster ?

fallow up: if i use render for example and i run on my pc or somewhere else sh ./scripts/test-installation.sh and it ask me the port am running since render uses URL base how am i gonna get this to work web-base or host the backend/model and where to host it

keldenl commented 1 year ago

try using mlock, that had historically helped me when i've had memory issues

msj121 commented 1 year ago

Also sometimes lowering the thread count helps, because it oversaturates, or perhaps uses a slower worker thread.