microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

How to use `preprocess.py` for NL-code-search-WebQuery #88

Closed vitchyr closed 2 years ago

vitchyr commented 2 years ago

Thank you for releasing this code. I'm confused about the data/preprocess.py script. The README for NL-code-search-WebQuery doesn't reference it at all. Does that mean that we don't need to use this preprocessing script? Similarly, can I ignore the the data/train.txt and data/valid.txt?

Jun-jie-Huang commented 2 years ago

Since there's no direct training set for our WebQueryTest dataset, we suggest using two external training sets: 1. CodeSearchNet; 2. CoSQA. The data/preprocess.py script and data in ./data is used to process the CodeSearchNet. If you are not going to use CodeSearchNet data to train, you can ignore them. We add some preprocessing instructions in README.

vitchyr commented 2 years ago

Thank you for explaining!

On Sun, Nov 14, 2021, 10:28 PM Jun-jie-Huang @.***> wrote:

Since there's no direct training set for our WebQueryTest dataset, we suggest using two external training sets: 1. CodeSearchNet; 2. CoSQA. The data/preprocess.py https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery/data script and data in ./data https://github.com/microsoft/CodeXGLUE/blob/main/Text-Code/NL-code-search-WebQuery/data is used to process the CodeSearchNet. If you are not going to use CodeSearchNet data to train, you can ignore them. We add some preprocessing instructions in README.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microsoft/CodeXGLUE/issues/88#issuecomment-968577823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZJPXKSXRZYD3I6IETDUMCSARANCNFSM5HJ6RAVQ .