microsoft / CodeBERT

CodeBERT
MIT License
2.15k stars 442 forks source link

CodeReviewer - possibility to run only Quality Estimation task #252

Closed sergiogrz closed 1 year ago

sergiogrz commented 1 year ago

Hello, I would like to know if it is possible to use CodeReviewer only for the Quality Estimation task. And, in that case, what would be the steps I should take to succesfully execute it and try it on some inputs of my own.

I mean, at this point, I have run the finetune-cls.sh script and got some checkpoint files. What would be the next steps? How could I use this checkpoint model to do some inference on my own examples of code changes? Is it necessary to run the other finetune scripts if I only want to try the Quality Estimation task?

celbree commented 1 year ago

If you only want to try the Quality Estimation task, you don't need run other finetune scripts. The next step is to follow run_test_cls.py script which will load your checkpoint and test in the test set. What you need to do is preparing your own dataset in our data format, and modifying eval_epoch_acc function to inference in your own examples.

sergiogrz commented 1 year ago

Thanks @celbree ! Is there a specific set of instructions to create the dataset for evaluation ? After checking the json files that you have used, I have created my own jsonl file with the format:

{"oldf": "...", "patch": "..."}
{"oldf": "...", "patch": "..."}
...

where oldf and patch have been extracted from a public repo, using the GitHub API. Unfortunately, when I run run_test_cls.py, I receive an error message, like it doesn't recognise my input: ValueError: num_samples should be a positive integer value, but got num_samples=0.

Any ideas about what I'm missing here?

sergiogrz commented 1 year ago

I may have found the solution to this last problem, which was related to the format passed to the jsonl file. Apparently now it works. Thanks!