Open luweigen opened 1 year ago
Hi Wei,
Let me share some guidance here. For the retrieval-augmented code generation, we follow the settings introduced in this paper (Retrieval Augmented Code Generation and Summarization) to evaluate our models. We adopt a straightforward approach, where we use the CodeT5+'s encoder for retrieving the top-1 code candidates and then concatenate it with the source text for the model's encoder, and the decoder is trained to generate the target code. You can employ this embedding model for the retrieval part. You need to prepare a training dataset of "text+retrieved top-1 code" and "target code" pairs for finetuning before the evaluation.
Any hints for reproducing the example in Figure.7 in the paper CodeT5+: Open Code Large Language Models for Code Understanding and Generation? Thanks in advance!