Query on Reproducing Code Reviewer "Review Comment Generation Task" Result

asifhaider commented 11 months ago

Hello! I am trying to reproduce the BLEU Score number that you mentioned in your CodeReviewer paper. The fine-tuning script you provided for the "review comment generation" downstream task, mentions the train steps to be 60,000. When applying the fine-tuned model to your own test dataset, I found the BLEU score to be 5.16, while the paper mentions it to be 5.32.

I used the exact same shell scripts that have been uploaded in this GitHub repository for CodeReviewer. My question is, what is the exact train steps count or train step checkpoint that you used to produce the score (5.32) mentioned in the paper? What might be the possible reasons for this mismatch?

celbree commented 10 months ago

Since the training process is not a definite process, the mismatch of 0.16 is acceptable. You can try to evaluate other checkpoints and might get one with a closer BLEU score.

asifhaider commented 10 months ago

Thanks!

microsoft / CodeBERT

Query on Reproducing Code Reviewer "Review Comment Generation Task" Result #280