Closed kevinthedang closed 2 months ago
Looks like as of May, the developers made it possible to create this feature (refer below).
Another possibility of this was mentioned back on Jan. 31st is still possible and I believe was mentioned above. The use of proxies can be on the table if needed.
With Ollama v0.2.0
, concurreny and parallel generation is possible for the bot.
Likely no implementation is needed, but that will need to be tested. Likely close this after #82 is resolved.
@JT2M0L3Y
Looks like Concurrency works as intended outside the box.
Discord:
Logging of the two conversations generating simultaneously:
This can be closed with #82 now
Something I did not read about initially when 0.2.0
was release but we might have to have some kind of implementation that allows user to select:
OLLAMA_MAX_LOADED_MODELS
- How many models allowed to be active in a given time period.OLLAMA_NUM_PARALLEL
- How many concurrent requests per modelMight be an issue we should create as a new feature. Possibly done through Slash Commands?
Thread Reference: Parallel Requests
@JT2M0L3Y
Issue
Solution
Notes