Open ghost opened 2 years ago
Thank you for pointing this out. It is our mistake not to consider the side effect of not tokenizing the code. As it would cause the BLEU score not convincing. We will add a new metric based on the tokenized code in the near future.
Actually, I think this issue not only impact the scores of code-to-code-trans but also impact the CodeBLEU analysis https://arxiv.org/pdf/2009.10297v2.pdf
Thank you for pointing this out. It is our mistake not to consider the side effect of not tokenizing the code. As it would cause the BLEU score not convincing. We will add a new metric based on the tokenized code in the near future.
Is the new metric still in plan?
Hi,
Do anyone know how to preprocessed a Java/C# function to one line sting same as the dataset: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/data
e.g. A multiline c# function
was preprocessed to
public virtual void print(string str){write(str != null ? str : Sharpen.StringHelper.GetValueOf((object)null));}"
Thanks