microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

Some questions about the results on CodeCompletion-token. #115

Closed GoneZ5 closed 2 years ago

GoneZ5 commented 2 years ago

Hi, I have some questions about the different results between the paper and GitHub on the task of code completion (token level).

The results in your paper: image The results in your Github: image

  1. What makes Github's results significantly improved?
  2. How to explain the small gap between the CodeGPT and CodeGPT-adapted on py150?
  3. If I want to compare your model on CodeCompletion-token, which result should I choose?

Thanks in advance!

celbree commented 2 years ago

Hi,

  1. We updated the dataset by applying literal normalization to avoid sensitive information. So the results in the arXiv version paper is out of date. Your can refer to our NeurIPS paper.
  2. CodeGPT-adapted is initialized by GPT-2, while CodeGPT is pre-trained from scratch. The pre-trained dataset are the same -- CodeSearchNet. So it is expected that CodeGPT-adapted performs better than CodeGPT since it inherits the knowledge from GPT-2.
  3. The results in our GitHub are always the newest.
GoneZ5 commented 2 years ago

Hi,

  1. We updated the dataset by applying literal normalization to avoid sensitive information. So the results in the arXiv version paper is out of date. Your can refer to our NeurIPS paper.
  2. CodeGPT-adapted is initialized by GPT-2, while CodeGPT is pre-trained from scratch. The pre-trained dataset are the same -- CodeSearchNet. So it is expected that CodeGPT-adapted performs better than CodeGPT since it inherits the knowledge from GPT-2.
  3. The results in our GitHub are always the newest.

Thanks for your reply! About question 2, I mean that the improvement of CodeGPT-adapted on py150 is too small than on javaCorpus. How to explain the phenomenon?

celbree commented 2 years ago

Compared with javaCorpus, PY150 is much larger. The number of tokens of PY150 is even bigger than pre-trained dataset. So the difference between transformer-based models are smaller. A strong evidence is that the transformer model w/o pre-training also has a comparable performance in PY150.

GoneZ5 commented 2 years ago

Compared with javaCorpus, PY150 is much larger. The number of tokens of PY150 is even bigger than pre-trained dataset. So the difference between transformer-based models are smaller. A strong evidence is that the transformer model w/o pre-training also has a comparable performance in PY150.

Thank you for your answer, I have understood!