Closed varunshenoy closed 8 months ago
Can you describe what you're trying to do? I'm not sure I understand
Suppose I have a list of prompts: ["A list of colors: red, blue", "Geckos eat", "The capital of France is"]
You can run these all through an LLM in a single forward pass with a little to no latency hit due to the inherent parallelism in transformers. Mainly used for inference servers with multiple people hitting it at the same time, but I want to see if I can speed up some local workflows.
Here's some documentation from Hugging Face's transformers
library that explains how they support batching:
Is there any way to batch prompts with local models using this package? Thank you!