microsoft / CodeBERT

CodeBERT
MIT License
2.15k stars 442 forks source link

Pretrain data #228

Open xieexiaotuzi opened 1 year ago

xieexiaotuzi commented 1 year ago

Hi, Thanks for sharing such a great work.

I would like to ask that do use remove the 'docsring' from the code of the codesearchnet dataseet when you pretrain the CodeBert? In detail, whether you use the 'code' for code generator and 'docstring' for nl generator; or use 'code(remove docstring from the code)' for code generator and 'docstring' for nl generator?

Thanks for your help.

guoday commented 1 year ago

'code(remove docstring from the code)' for code generator and 'docstring' for nl generator