Closed GoneZ5 closed 2 years ago
Hi,
Hi,
- We updated the dataset by applying literal normalization to avoid sensitive information. So the results in the arXiv version paper is out of date. Your can refer to our NeurIPS paper.
- CodeGPT-adapted is initialized by GPT-2, while CodeGPT is pre-trained from scratch. The pre-trained dataset are the same -- CodeSearchNet. So it is expected that CodeGPT-adapted performs better than CodeGPT since it inherits the knowledge from GPT-2.
- The results in our GitHub are always the newest.
Thanks for your reply! About question 2, I mean that the improvement of CodeGPT-adapted on py150 is too small than on javaCorpus. How to explain the phenomenon?
Compared with javaCorpus, PY150 is much larger. The number of tokens of PY150 is even bigger than pre-trained dataset. So the difference between transformer-based models are smaller. A strong evidence is that the transformer model w/o pre-training also has a comparable performance in PY150.
Compared with javaCorpus, PY150 is much larger. The number of tokens of PY150 is even bigger than pre-trained dataset. So the difference between transformer-based models are smaller. A strong evidence is that the transformer model w/o pre-training also has a comparable performance in PY150.
Thank you for your answer, I have understood!
Hi, I have some questions about the different results between the paper and GitHub on the task of code completion (token level).
The results in your paper: The results in your Github:
Thanks in advance!