Closed mydeveloperplanet closed 8 months ago
@mydeveloperplanet hello sorry for the delay, would you be willing to post the docker-compose and the models yaml here (I have been hard at work updating the yamls on the site so I am unsure what one you pulled.)
Note: updated the .env
only works if you wipe the docker and reup it (docker-compose down --rmi all
then docker-compose up --pull always
A change ill be making to the site is move the CPU cores from the env to the models yaml.
No problem, your reply is fast enough :-)
Here the docker-compose and model yamls: localai.tar.gz
If there is anything I can do from my side, do not hesitate to ask, I am willing to help.
Okay yea, so your running the older yaml. to fix, remove the f16
and gpu layers
from the yaml then add threads: X
where x is the number of threads, this being said you are running on CPU only, and that will always be 25x slower than GPU for this. You can use both at the same time by keeping the f16
(set it to true) and changing the GPU layers to whatever number you GPU supports. - @mydeveloperplanet
Small reminder, each time you change a yaml you need to restart the docker
docker-compose restart
- Windows
docker compose restart
- Linux
I forgot to mention: we did some tests on Thursday/Friday with a GPU and the results are indeed much better. Did not thought that it would make that kind of a difference. This is issue can be closed, conclusion to me is that you should always make use of a GPU and CPU is more for testing purposes. And thanks for your support!
can u suggest how can i replicate same in kubernetes cluster?
We started experimenting with LocalAI and are very enthusiastic about it. However, we encounter slow response when we want to chat based on documents. The example we use is based on this LangChain4j example. The slightly adapted source code we used, is added below this issue.
If we run the example against OpenAI, we receive a response in 10 seconds. If we run the example against LocalAI, we receive a response in 138 seconds.
We checked the advise on https://localai.io/faq/:
docker stats
, we notice that the container has access to all CPUs and all memory.We cannot figure out what causes the slow response.
If more data, information is needed, please let us know.