Closed masies closed 3 years ago
Hi, CodeBLEU is calculated on the corpus level. It will calculate the total number of reference data-flows of the whole corpus when calculating data-flow match score. In your case, there is only one instance in the corpus, and if there is no data-flow extracted from this sample, the total count will be 0 and the ZeroDivisionError will occur. We have modified the script to handle this problem, and return the data-flow match score to be 0. However, considering there is no data-flow extracted from it, this score could be ignored as far as I'm concerned, and you can calculate the CodeBLEU score based on the first three features by setting the hyper-parameters to be 1/3, 1/3, 1/3, 0.
Thanks a lot for that. To be a little more precise about my use case, what I need is to compute a metric to understand how much a code prediction (generated by NN) differs from a given target prediction. Do you think is it appropriate to use codeBLEU for each pair given that it may not extract the data-flow? is it still reliable enough when it will extract it from a single snippet of code?
Hi, I think CodeBLEU still has a reference value when it may not extract the data-flow, but at this time it degenerates into a simple token-level matching. And when it extracts it from a single snippet of code, I don't think it is still reliable enough. Just as BLEU is more meaningful at the corpus level, CodeBLEU is designed at the corpus level too.
When I try to run the script for computing code bleu with one instance of code only, I sometimes get stuck in a ZeroDivisionError; for example the following one:
target code:
private Map<String, ArrayList<Order>> getBuyOrders() { return buyOrders; }
predicted code:private HashMap<String, ArrayList<Order>> getBuyOrders() { return buyOrders; }
trying to run :
python calc_code_bleu.py --refs target.txt --hyp prediction.txt --lang java --params 0.25,0.25,0.25,0.25
I get this :
Evironment :