Open Muennighoff opened 1 year ago
cc @yuewang-cuhk
Hi there, by doing this, we also feed the text prompt to decoder to better provide the prefix contexts for the models. We find that this is very helpful for CodeT5+ models >=2B, as these models have a deep decoder initilized from frozen GPT-style LLMs, doing this can have a better compatibility with the default behaviours of GPT models. Noth that for CodeT5+ 220M and 770M, they do not need such additional prefix prompts as they are pretrained from scratch.
Hi, great work! Why are you feeding the instruction both into the encoder & decoder? I would have expected only the encoder to receive the instruction, however, for HumanEval you do: