microsoft / CodeBERT

CodeBERT
MIT License
2.16k stars 446 forks source link

How to use this Codereviewer on HuggingFace #194

Closed mpociot closed 1 year ago

mpociot commented 1 year ago

Hi!

I want to see how I can use the codereviewer pre-trained models on Hugging Face. Do I still need to train the model somehow? If I try to paste the demo JSON into the Hugging Face inference API/UI it doesn't return any good results.

Am I missing something?

celbree commented 1 year ago

Hi, CodeReviewer can be directly used on comment generation task. (Finetuning will make results better). If you want to use CodeReviewer in other downstream tasks like diff quality estimation or code refinement, you need finetune the model.

Our CodeReviewer model in model.py is based on T5ForConditionalGeneration but different from it. So the huggingface inferface cannot generate good results.

mpociot commented 1 year ago

@celbree thank you very much.

Could you please help me figure out how I can do this?

I cloned the repository and downloaded all of the dataset files. I then successfully ran the bash finetune-cls.sh script, which generated a bunch of checkpoint files.

Now how would I go about running finetune-msg.sh as the next step?

The finetune script is looking for these files: preds.txt and golds.txt (https://github.com/microsoft/CodeBERT/blob/master/CodeReviewer/code/run_finetune_msg.py#L82)

As the default bash script is only passing microsoft/codereviewer as the model, these files don't really exist. (https://github.com/microsoft/CodeBERT/blob/master/CodeReviewer/code/sh/finetune-msg.sh#L27)

I'm sorry if these things are meant to be obvious - I'm trying to get started with AI / CodeReviewer and I just tried following the steps in the README. Any help is highly appreciated.

rajanpanchal commented 1 year ago

@mpociot. Were you able to solve this?