Open C0deXG opened 1 year ago
when i ask a qestions it is soo slow it is taking forever to write one sentence how can i make it faster btw am using vicuna 7B to make it light wight for me and am using mac OS m2 chip and that doesnt even help :( so can i host the gpt-llama.cpp on render if so yes when i run
sh ./scripts/test-installation.sh
what should i put for the port and the locations of the file since am using render to render the model to make it faster ?
fallow up: if i use render for example and i run on my pc or somewhere else sh ./scripts/test-installation.sh
and it ask me the port am running since render uses URL base how am i gonna get this to work web-base or host the backend/model and where to host it
try using mlock, that had historically helped me when i've had memory issues
Also sometimes lowering the thread count helps, because it oversaturates, or perhaps uses a slower worker thread.
when i ask a qestions it is soo slow it is taking forever to write one sentence how can i make it faster btw am using vicuna 7B to make it light wight for me and am using mac OS m2 chip and that doesnt even help :( so can i host the gpt-llama.cpp on render if so yes when i run
sh ./scripts/test-installation.sh
what should i put for the port and the locations of the file since am using render to render the model to make it faster ?