salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.71k stars 396 forks source link

Is it possible to finetune codet5 with the programming language which is not a part of codesearchnet dataset. #22

Closed likhith00 closed 2 years ago

likhith00 commented 2 years ago

If it is not possible is pretraining codet5 with that new programming language dataset the only option?

yuewang-cuhk commented 2 years ago

Hi, it is possible that fine-tuning CodeT5 on a new programming language (PL) would achieve a reasonable result, as different PLs might share some common patterns and this allows transfer learning to another PL. For example, we find that fine-tuning CodeT5 on Apex yields good results (as shown in the GIF animation) as it is very similar to Java.