microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

how to get the preference.txt and the candidate.txt? #107

Closed YHX-X closed 2 years ago

YHX-X commented 2 years ago

hi , I want to calculate the CodeBLEU. Could you please tell me how to generate the preference.txt and the candidate.txt? Thanks a lot.

celbree commented 2 years ago

The reference file and the prediction file should be the same format. Like shown in here, it is in plain text format with each line containing one example (in this case, one method).

YHX-X commented 2 years ago

The reference file and the prediction file should be the same format. Like shown in here, it is in plain text format with each line containing one example (in this case, one method).

Thank you for your reply. I have another question: I calculated the CodeBLEU of two semantically similar pieces of code that in BigCloneBench,but the results of CodeBLEU were poor(such as 24.1%, 14.7%). The params I set was 0.1,0.1,0.4,0.4. The scores of the syntax_match and the dataflow_match were below 50%. Could you please tell me why I get the poor results?

celbree commented 2 years ago

I think the reason probably is: CodeBLEU is the metric to measure how similar between model predictions and ground truths in ngram level, syntax level and dataflow. It is not used to measure two different approaches for the same functionality. The latter is related to the clone detection task, which is what BigCloneBench target at. CodeBLEU obviously cannot solve such problem since to understand the semantics of codes requires powerful neural networks.