Note: This repository is no longer maintained now. If you are interested in the deep learning for program analysis (e.g., code summarization, code retrieval, code completion and type inference), please refer to our new project NaturalCC (https://github.com/CGCL-codes/naturalcc).
This repos is developed based on the environment of:
/media/BACKUP/ghproj_d/code_summarization/github-python/ is the folder to save all the data in this project, please replace it to your own folder. The data files are organized as follows in my computer:
|- /media/BACKUP/ghproj_d/code_summarization/github-python
|--original (used to save the raw data)
|----data_ps.declbodies data_ps.descriptions
|--processed (used to save the preprocessed data)
|----all.code all.comment
|--result (used to save the results)
|--train (get the data files before training)
You need to get these files before you starting to train our model. Here I put the original folder in the dataset foler of this project. You'd better copy them to your own folder.
cd script/github
python python_process.py -train_portion 0.6 -dev_portion 0.2 > log.python_process
Back to the projector folder
cd ../..
python run.py preprocess
python run.py train_a2c 10 30 10 hybrid 1 0
python run.py test_a2c hybrid 1 0
This repos is based on https://github.com/khanhptnk/bandit-nmt
Please cite our paper if you use this repos.
Bibtex:
@Inproceedings{wan2018improving,
title={Improving automatic source code summarization via deep reinforcement learning},
author={Wan, Yao and Zhao, Zhou and Yang, Min and Xu, Guandong and Ying, Haochao and Wu, Jian and Yu, Philip S},
booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering}
pages={397--407},
year={2018},
organization={ACM}
}