microsoft / CodeBERT

CodeBERT
MIT License
2.16k stars 446 forks source link

ASTs for code-search #187

Closed poojitharamachandra closed 1 year ago

poojitharamachandra commented 1 year ago

You say in the paper for UnixCoder that ASTs are used in addition to source code for downstream tasks such as code-search. Could you please point to the code snippet, where you generate ASTs?

poojitharamachandra commented 1 year ago

Hi,

can you provide the code.zip mentioned in one of the earlier queries.-

code.zip. Follow the Readme to download pre-training data. And then you can use preprocess function in preprocess.py to get flattened AST.

Originally posted by @guoday in https://github.com/microsoft/CodeBERT/issues/134#issuecomment-1127643876

guoday commented 1 year ago

You can directly download the code.zip (https://github.com/microsoft/CodeBERT/files/8700184/code.zip). We only use ASTs in the pre-training phase. For downstream tasks, we don't need to use ASTs. To alleviate this gap, as mentioned in the paper, we exchange source code with/without ASTs to pre-train the model.