microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.5k stars 363 forks source link

Evaluation of Code-to-Code translation #151

Open pkuzqh opened 1 year ago

pkuzqh commented 1 year ago

Hi, It is a little strange in the evaluation of code-to-code translation subtask. In the evaluation script, it directly uses 'split' to tokenize the code, which is affecting the calculation of the BLEU score.. It is more proper to use some tools (javalang, tree-sitter) to tokenize the code. I can provide some related code and the tokenized data if you need.