salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.66k stars 391 forks source link

Duplicate Instruction in Enc and Dec for HumanEval #115

Open Muennighoff opened 1 year ago

Muennighoff commented 1 year ago

Hi, great work! Why are you feeding the instruction both into the encoder & decoder? I would have expected only the encoder to receive the instruction, however, for HumanEval you do:

           prompt_batch = [INSTRUCTION.format(extract_text(prompt))]
           prompt_batch_decoder = [INSTRUCTION.format(extract_text(prompt)) + prompt]
Muennighoff commented 1 year ago

cc @yuewang-cuhk

yuewang-cuhk commented 1 year ago

Hi there, by doing this, we also feed the text prompt to decoder to better provide the prefix contexts for the models. We find that this is very helpful for CodeT5+ models >=2B, as these models have a deep decoder initilized from frozen GPT-style LLMs, doing this can have a better compatibility with the default behaviours of GPT models. Noth that for CodeT5+ 220M and 770M, they do not need such additional prefix prompts as they are pretrained from scratch.