The dataset used in the paper CodeBERT

microsoft / CodeXGLUE

CodeXGLUE

MIT License

1.56k stars 366 forks source link

The dataset used in the paper CodeBERT #53

Closed hhliu79 closed 3 years ago

hhliu79 commented 3 years ago

Hello, could you please tell me that did you use the same dataset as that shown in the link https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text for the Natural Language Code Search task and Code Documentation Generation task in the paper 'CodeBERT:A Pre-Trained Model for Programming and Natural Languages'?

Thank you very much.

guody5 commented 3 years ago

Yes. The same dataset.

hhliu79 commented 3 years ago

Thank you for your reply. In addition, could you please tell me that are the partition (train, validation, test) of the dataset and the calculation of the MRR score the same as the paper CodeBERT? Thank you so much.

guody5 commented 3 years ago

No. For documenting generation, we filter some dirty data and you can find the https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text.

hhliu79 commented 3 years ago

Ok. Thank you.