How to do Retrieval-Augmented Generation?

Hi Wei,

Let me share some guidance here. For the retrieval-augmented code generation, we follow the settings introduced in this paper (Retrieval Augmented Code Generation and Summarization) to evaluate our models. We adopt a straightforward approach, where we use the CodeT5+'s encoder for retrieving the top-1 code candidates and then concatenate it with the source text for the model's encoder, and the decoder is trained to generate the target code. You can employ this embedding model for the retrieval part. You need to prepare a training dataset of "text+retrieved top-1 code" and "target code" pairs for finetuning before the evaluation.

salesforce / CodeT5

How to do Retrieval-Augmented Generation? #135