ntunlp / xCodeEval

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
MIT License
74 stars 6 forks source link

More details on how to use the released code retrieval model #7

Closed lazyhope closed 1 year ago

lazyhope commented 1 year ago

Hi, Thanks for your work! I am trying to use the pre-trained code retrieval model you have released but don't know how, could you please provide a demo of loading the checkpoint and running the model on an arbitrary code-code retrieval task? Much appreciated!

sbmaruf commented 1 year ago

For the model loading, you should follow this class BertForPreTraining (or any of it's derivative class) as mentioned in the config of the model https://huggingface.co/bigcode/starencoder The code retrieval model was trained with DPR repository as mentioned in the paper. You need to index your dataset using this script, https://github.com/facebookresearch/DPR/blob/main/generate_dense_embeddings.py. Finally you can use this script to retrive content, https://github.com/facebookresearch/DPR/blob/main/dense_retriever.py Hope this helps. Please let us know if you have any further queries. We have sent an email to the author of the DPR repository to have more clarification on the license of the DPR codebase. Hopefully we can release our codebase soon.

lazyhope commented 1 year ago

Thank you so much!