Closed varun-singhh closed 1 month ago
Hi @varun-singhh
Can you elaborate on this please
Implementing a batched inference method that takes multiple prompts and generates responses for each & demonstrates basic tuning of batch size for efficiency.
I think we already provide batched inference with the generate
api.
EDIT: Also can work on the recipe for fine-tuning Llama using Low-Rank Adaptation to enable fine-tuning while reducing memory requirements
We have a recipe on Low-Rank Adaptation here. Is this was you were thinking to build or something else?
Yeah, I was planning to go with the batched inference but yeah that's pretty much there. Will look for some other recipes to work on. Closing this issue.
From #43
Implementing a batched inference method that takes multiple prompts and generates responses for each & demonstrates basic tuning of batch size for efficiency.
EDIT: Also can work on the recipe for fine-tuning Llama using Low-Rank Adaptation to enable fine-tuning while reducing memory requirements