Closed vitchyr closed 2 years ago
Since there's no direct training set for our WebQueryTest dataset, we suggest using two external training sets: 1. CodeSearchNet; 2. CoSQA. The data/preprocess.py script and data in ./data is used to process the CodeSearchNet. If you are not going to use CodeSearchNet data to train, you can ignore them. We add some preprocessing instructions in README.
Thank you for explaining!
On Sun, Nov 14, 2021, 10:28 PM Jun-jie-Huang @.***> wrote:
Since there's no direct training set for our WebQueryTest dataset, we suggest using two external training sets: 1. CodeSearchNet; 2. CoSQA. The data/preprocess.py https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery/data script and data in ./data https://github.com/microsoft/CodeXGLUE/blob/main/Text-Code/NL-code-search-WebQuery/data is used to process the CodeSearchNet. If you are not going to use CodeSearchNet data to train, you can ignore them. We add some preprocessing instructions in README.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microsoft/CodeXGLUE/issues/88#issuecomment-968577823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZJPXKSXRZYD3I6IETDUMCSARANCNFSM5HJ6RAVQ .
Thank you for releasing this code. I'm confused about the data/preprocess.py script. The README for NL-code-search-WebQuery doesn't reference it at all. Does that mean that we don't need to use this preprocessing script? Similarly, can I ignore the the data/train.txt and data/valid.txt?