microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.56k stars 366 forks source link

AssertionError: Sequence length of prediction and answer are not equal, 1416: 1417 #111

Closed GoneZ5 closed 2 years ago

GoneZ5 commented 2 years ago

Hi, thanks for this project! I am reproducing the task of token-level Code Completion (javaCorpus) using CodeGPT-adapted provided. During finetune and inference, it runs smoothly. But when I use the test.txt to do the evaluation, there is something wrong with the different lengths of prediction and answer.

Traceback (most recent call last): File "/home/zhoupingyi/gongzi/code-moe/codes/tasks/CodeCompletion_token/evaluator/evaluator.py", line 45, in main() File "/home/zhoupingyi/gongzi/code-moe/codes/tasks/CodeCompletion_token/evaluator/evaluator.py", line 29, in main assert len(pred) == len(gt), f"Sequence length of prediction and answer are not equal, {len(pred)}: {len(gt)}" AssertionError: Sequence length of prediction and answer are not equal, 1416: 1417

There are many lines in prediction and test with similar errors. Most of the length difference is about 1.

The same problem happened on the py150 dataset.

If I just pass these samples in py150, the accuracy of CodeGPT-adapted-python is only 75.61 after 50000 steps of training.

How can I fix this problem? Thanks in advance!

celbree commented 2 years ago

Hi, did you run into issues when you generate the answer file by running eval_acc function in run_lm.py? Does the error only occurs when running evaluator.py?

GoneZ5 commented 2 years ago

Hi, thanks for your reply! I run the eval_acc function in run_lm.py to generate the prediction.txt, and I directly use the test.txt as the answer file.  The error only occurs in evaluator.py.

发自我的iPhone

------------------ Original ------------------ From: Shuai Lu @.> Date: Fri,Apr 1,2022 1:07 PM To: microsoft/CodeXGLUE @.> Cc: GoneZ5 @.>, Author @.> Subject: Re: [microsoft/CodeXGLUE] AssertionError: Sequence length ofprediction and answer are not equal, 1416: 1417 (Issue #111)

Hi, did you run into issues when you generate the answer file by running eval_acc function in run_lm.py? Does the error only occurs when running evaluator.py?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

celbree commented 2 years ago

I think this error might be caused by some corner cases when calling split() in the evaluator.py file. Since it goes well when you run run_lm.py, which is expected to output an accuracy number, you can use that number as your result. It is no difference between it and the evaluator output.

And we will look into this issue in the evaluator.py.

GoneZ5 commented 2 years ago

Thanks! I have a few other questions.

  1. Do the CodeGPT-adapted models used in the task of Code Completion only pre-train on the program languages, or PL-NL pairs?
  2. How about the CodeGPT-adapted models in the task of text-to-code?

Looking forward to your reply!

celbree commented 2 years ago
  1. It is pre-trained only on the PL data, which contains some NL comments. CodeGPT-adapted is initialized by GPT-2, so it inherits the NL knowledge from GPT-2.
  2. Please refer text-to-code task on the CONCODE dataset, our baseline model is also CodeGPT.
GoneZ5 commented 2 years ago
  1. It is pre-trained only on the PL data, which contains some NL comments. CodeGPT-adapted is initialized by GPT-2, so it inherits the NL knowledge from GPT-2.
  2. Please refer text-to-code task on the CONCODE dataset, our baseline model is also CodeGPT.

Thanks for your prompt response!

  1. You mean that some NL comments aren't extracted from the code?
  2. I'm sorry I didn't describe the problem clearly. Do the CodeGPT-adapted models in the task of text-to-code also are pre-trained on the PL data, which contains some NL comments? Are the pre-trained models of the task of Code Completion and Text-to-Code the same?
  3. I found the output number of run_lm.py is far from what you've reported in the py150. I only get 76.0, the setting of finetuning is GPU=2, learning_rate=8e-5, per_gpu_train_batch_size=2, gradient_accumulation_steps=4. Are these settings the same as yours?
celbree commented 2 years ago
  1. You mean that some NL comments aren't extracted from the code?

We use CodeSearchNet dataset for pre-training, so, some inline comments are not filtered.

  1. Do the CodeGPT-adapted models in the task of text-to-code also are pre-trained on the PL data, which contains some NL comments? Are the pre-trained models of the task of Code Completion and Text-to-Code the same?

The pre-trained models are the same, but fine-tuned on the specific dataset for each task.

  1. Are these settings the same as yours?

For PY150, we use a larger batch size (GPU*per_gpu_train_batch_size*gradient_accumulation_steps) as 64.

GoneZ5 commented 2 years ago

All my questions have been resolved! Thank you very much!