Closed poojitharamachandra closed 1 year ago
Hi,
can you provide the code.zip mentioned in one of the earlier queries.-
code.zip. Follow the Readme to download pre-training data. And then you can use preprocess
function in preprocess.py
to get flattened AST.
Originally posted by @guoday in https://github.com/microsoft/CodeBERT/issues/134#issuecomment-1127643876
You can directly download the code.zip (https://github.com/microsoft/CodeBERT/files/8700184/code.zip). We only use ASTs in the pre-training phase. For downstream tasks, we don't need to use ASTs. To alleviate this gap, as mentioned in the paper, we exchange source code with/without ASTs to pre-train the model.
You say in the paper for UnixCoder that ASTs are used in addition to source code for downstream tasks such as code-search. Could you please point to the code snippet, where you generate ASTs?