huggingface / huggingface-llama-recipes

531 stars 59 forks source link

Recipe for Llama Text Generation with Multiple Input Prompts #63

Closed varun-singhh closed 1 month ago

varun-singhh commented 1 month ago

From #43

Implementing a batched inference method that takes multiple prompts and generates responses for each & demonstrates basic tuning of batch size for efficiency.

EDIT: Also can work on the recipe for fine-tuning Llama using Low-Rank Adaptation to enable fine-tuning while reducing memory requirements

ariG23498 commented 1 month ago

Hi @varun-singhh

Can you elaborate on this please

Implementing a batched inference method that takes multiple prompts and generates responses for each & demonstrates basic tuning of batch size for efficiency.

I think we already provide batched inference with the generate api.

EDIT: Also can work on the recipe for fine-tuning Llama using Low-Rank Adaptation to enable fine-tuning while reducing memory requirements

We have a recipe on Low-Rank Adaptation here. Is this was you were thinking to build or something else?

varun-singhh commented 1 month ago

Yeah, I was planning to go with the batched inference but yeah that's pretty much there. Will look for some other recipes to work on. Closing this issue.