Evaluation of Code-to-Code translation

Hi, It is a little strange in the evaluation of code-to-code translation subtask. In the evaluation script, it directly uses 'split' to tokenize the code, which is affecting the calculation of the BLEU score.. It is more proper to use some tools (javalang, tree-sitter) to tokenize the code. I can provide some related code and the tokenized data if you need.