News
- 🎉 Our new paper MapCoder got accepted in ACL 2024.
- All our codebase in both MapCoder and Redcoder are open-sourced with MIT and Modified MIT License.
- See you at ACL, 2024, Bangkok, Thailand.
REDCODER (Retrieval augmentED CODe gEneration and summaRization)
This is repository for the SCODE-R retriever in the Retrieval Augmented Code Generation and Summarization paper.
If you find this paper or this code useful, please cite this paper:
@inproceedings{parvez2021retrieval,
title = {Retrieval Augmented Code Generation and Summarization},
author = {Parvez, Md Rizwan and Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {EMNLP-Findings},
year = {2021}
}
Our model has two parts. You can use them seperately as well.
- SCODE-R: Summary and Code Retriever. Please see instructions in
./SCODE-R
.
- SCODE-G: Summary and Code Generator. Please see instructions in
./SCODE-G
.
All REDCODER data/models/outputs together:
- Exclude retrieval candidate embeddings(too large)
- Exclude tokenized input to SCODE-G (by sentencepiece, we provide code and docs in
SCODE-G
directory. Please use them instead.)
- Please go through issues specially this issue
- Sample SCODE-R output: code to text valid split top 30 k retrievals
- Finetuned SCODE-R checkpoints:
- Code2Text Python: Link
- Text2Code Python: Link
- Code2Text Java: Link
- Text2Code Java: Link
- All the retrieval database: (a) one combined summary retrieval corpus for code2text for both python and Java (b) Java and Python code retrieval corpus for text2Code tasks: LINK