snexus / llm-search

Querying local documents, powered by LLM
MIT License
421 stars 51 forks source link

Scope for batched predictions #71

Open saswat0 opened 7 months ago

saswat0 commented 7 months ago

@snexus Kudos on this awesome project!

I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?

TIA

snexus commented 7 months ago

Hi,

Thanks for the suggestion. How do you think batched prompts can be useful in the context of RAG?

saswat0 commented 7 months ago

One that I can think of is that, if deployed into production, the server could queue the requests (prompts) and the RAG would run only once. Effectively, the time difference would be slightly higher but GPu utilisation would increase by several folds

snexus commented 7 months ago

I will add it as a potential improvement when implementing support for vLLM in the future. Thanks for the suggestion.