I had great success with this project when using ollama.
I swapped to a model that runs on koboldai (openai compatible) and now the bot responses take ages.
It processes the input quick and has the answer ready in ~30 seconds max, but when the bot is posting the answer to discord, python3 starts running 100% on a single core and the bot takes ages to finish posting.
Hi,
I had great success with this project when using ollama.
I swapped to a model that runs on koboldai (openai compatible) and now the bot responses take ages.
It processes the input quick and has the answer ready in ~30 seconds max, but when the bot is posting the answer to discord, python3 starts running 100% on a single core and the bot takes ages to finish posting.
Any idea what I'm doing wrong?