microsoft / CodeBERT

CodeBERT
MIT License
2.09k stars 430 forks source link

Questions about additional C/C++ training dataset #308

Open Hustcw opened 6 months ago

Hustcw commented 6 months ago

Hi, I'm interested in the datasets used for training the unixcoder-base-nine. Do you have plans to open source the additional 1.5M NL-PL pairs of C, C++ and C# programming language? If I missed any links, please let me know. Looking forward to your reply, many thanks

asyed79gatech commented 4 months ago

@Hustcw +1

Hi, I'm interested in the datasets used for training the unixcoder-base-nine. Do you have plans to open source the additional 1.5M NL-PL pairs of C, C++ and C# programming language? If I missed any links, please let me know. Looking forward to your reply, many thanks