salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.66k stars 391 forks source link

Request for complete pre-training dataset #86

Closed oathaha closed 1 year ago

oathaha commented 1 year ago

Hi. Can you tell me where I can obtain the full pre-training dataset used to pre-train CodeT5?

yuewang-cuhk commented 1 year ago

Hi there, we did not plan to release the pretraining dataset for CodeT5. Just to share one of the most popular code pretraining data: the Stack for your reference.