salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.68k stars 394 forks source link

Do you plan to release code for pretraining? #50

Closed subercui closed 2 years ago

subercui commented 2 years ago

Hi, do you plan to release code for pretraining? Or maybe you have already released it, not sure which file to look at.

rdurelli commented 2 years ago

I would like to get involved in this too :D

yuewang-cuhk commented 2 years ago

Hi, we do not plan to release the pretraining code (duplicated as issue 40). We'd be happy to anwser the questions regarding its implementation details.

redthing1 commented 2 years ago

Hi, do you plan to release code for pretraining? Or maybe you have already released it, not sure which file to look at.

It sucks that they are so adamant to not release the code, who knows what their reasons are. Anyway, CodeT5 is not such a great model anyway, I personally would recommend you look at CodeGen, a very recent model that is much, much more powerful than CodeT5 (in our testing, even the smallest codegen model drastically outperformed CodeT5 for tasks such as code completion and summarization). Also CodeT5 models were only 220M (I think) at max, but CodeGen models go from 350M, 2.7B, 6B, and 16B. And that actually has significantly more code release.

@subercui @rdurelli