kevinthedang / discord-ollama

Discord Bot that utilizes Ollama to interact with any Large Language Models to talk with users and allow them to host/create their own models.
Creative Commons Attribution 4.0 International
73 stars 8 forks source link

Multi-User Chat Generation #53

Closed kevinthedang closed 2 months ago

kevinthedang commented 5 months ago

Issue

image

Solution

Notes

kevinthedang commented 3 months ago

Looks like as of May, the developers made it possible to create this feature (refer below).

Another possibility of this was mentioned back on Jan. 31st is still possible and I believe was mentioned above. The use of proxies can be on the table if needed.

References

kevinthedang commented 2 months ago

With Ollama v0.2.0, concurreny and parallel generation is possible for the bot.

https://github.com/ollama/ollama/releases/tag/v0.2.0

kevinthedang commented 2 months ago

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests

kevinthedang commented 2 months ago

Likely no implementation is needed, but that will need to be tested. Likely close this after #82 is resolved.

@JT2M0L3Y

kevinthedang commented 2 months ago

Looks like Concurrency works as intended outside the box.

Discord: image

Logging of the two conversations generating simultaneously: image

This can be closed with #82 now

kevinthedang commented 2 months ago

Something I did not read about initially when 0.2.0 was release but we might have to have some kind of implementation that allows user to select:

  1. OLLAMA_MAX_LOADED_MODELS - How many models allowed to be active in a given time period.
  2. OLLAMA_NUM_PARALLEL - How many concurrent requests per model

Might be an issue we should create as a new feature. Possibly done through Slash Commands?

Thread Reference: Parallel Requests

@JT2M0L3Y