microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.56k stars 366 forks source link

Evaluation of Code-to-Code translation #151

Open pkuzqh opened 1 year ago

pkuzqh commented 1 year ago

Hi, It is a little strange in the evaluation of code-to-code translation subtask. In the evaluation script, it directly uses 'split' to tokenize the code, which is affecting the calculation of the BLEU score.. It is more proper to use some tools (javalang, tree-sitter) to tokenize the code. I can provide some related code and the tokenized data if you need.