salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.68k stars 394 forks source link

Reproducing defect results #54

Closed Kamel773 closed 2 years ago

Kamel773 commented 2 years ago

Dear CodeT4 team,

Let me to thank you for sharing your dataset and models with us. I have reproduced the defect experiment with batch size = 8 in four GPUs so that the batch size is 32. In file exp_with_args.sh, I modified the CUDA_VISIBLE_DEVICES=0,1,2,3.

In your paper, you reported the accuracy is %65.78, and I reproduced the same experiment and got %64.09. I am not sure what is the problem in my experiment, and I appreciate any help to reproduce the same results that you got.

Training: [0] Best acc changed into 0.6175 [1] Best acc changed into 0.6482 [2] Best acc changed into 0.6552 [3] Best acc changed into 0.6654 [6] Early stop as not_acc_inc_cnt=3

[best-acc] test-acc: 0.6409 [best-acc] test-acc: 0.6409

Testing: accuracy_score 64.0922401171303 precision_score 67.08229426433915 recall_score 42.86852589641435 f1_score 52.309188138065146 [[1213 264] [ 717 538]]

yuewang-sf commented 2 years ago

Hi, using the multi-gpu might affect the final result. To reproduce the results, we suggest you employ the same setting (gpu_num=1) or directly employ our released finetuned checkpoint following the instructions here.

Btw, we are CodeT5 team :)