microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

Question for Java/C# function preprocessed to one line string #114

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi,

Do anyone know how to preprocessed a Java/C# function to one line sting same as the dataset: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/data

e.g. A multiline c# function

public virtual void print(  string str ) 
{
  write(str != null ? str : Sharpen.StringHelper.GetValueOf( (object) null ) );
}

was preprocessed to public virtual void print(string str){write(str != null ? str : Sharpen.StringHelper.GetValueOf((object)null));}"

Thanks

celbree commented 2 years ago

Thank you for pointing this out. It is our mistake not to consider the side effect of not tokenizing the code. As it would cause the BLEU score not convincing. We will add a new metric based on the tokenized code in the near future.

ghost commented 2 years ago

Actually, I think this issue not only impact the scores of code-to-code-trans but also impact the CodeBLEU analysis https://arxiv.org/pdf/2009.10297v2.pdf

Gompyn commented 1 year ago

Thank you for pointing this out. It is our mistake not to consider the side effect of not tokenizing the code. As it would cause the BLEU score not convincing. We will add a new metric based on the tokenized code in the near future.

Is the new metric still in plan?