Can we fine-tuning the Text-to-Code Retrieval task?

salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation

https://arxiv.org/abs/2305.07922

BSD 3-Clause "New" or "Revised" License

2.65k stars 391 forks source link

Can we fine-tuning the Text-to-Code Retrieval task? #146

Open pdhung3012 opened 10 months ago

pdhung3012 commented 10 months ago

Hello I wonder if we can finetune the text-to-code retrieval task for Text-to-Code Retrieval like UniXcoder at here. I have run the zero-shot code retrieval for Javascript. It shows that the best accuracy I can get for code retrieval is 70.2%, which is lower than the fine-tuned CodeT5+ at 71.3\% (reported in CodeT5+ paper at here. So I want to check if I can increase the zero-shot result by fine-tuning.

Thank you

yuewang-cuhk commented 10 months ago

Yes, you can definitely finetune on labeled datasets using contrastive loss (or combined with the matching loss) to further boost the retrieval performance. We plan to release the finetuning scripts in the future if there are many asks for this.

gzt4se commented 6 months ago

Yes, you can definitely finetune on labeled datasets using contrastive loss (or combined with the matching loss) to further boost the retrieval performance. We plan to release the finetuning scripts in the future if there are many asks for this.

I would like to ask if there are now open-source finetune scripts to share for Text-to-Code Retrieval using codet5+, thanks a lot!