Batching - Githubissues

varunshenoy commented 8 months ago

Is there any way to batch prompts with local models using this package? Thank you!

mxyng commented 8 months ago

Can you describe what you're trying to do? I'm not sure I understand

varunshenoy commented 8 months ago

Suppose I have a list of prompts: ["A list of colors: red, blue", "Geckos eat", "The capital of France is"]

You can run these all through an LLM in a single forward pass with a little to no latency hit due to the inherent parallelism in transformers. Mainly used for inference servers with multiple people hitting it at the same time, but I want to see if I can speed up some local workflows.

Here's some documentation from Hugging Face's transformers library that explains how they support batching:

mxyng commented 8 months ago

This issue would be more appropriate for ollama since this repo is only the python interface to the ollama API.

ollama / ollama-python

Batching #25