Closed lazyhope closed 1 year ago
For the model loading, you should follow this class BertForPreTraining
(or any of it's derivative class) as mentioned in the config of the model https://huggingface.co/bigcode/starencoder
The code retrieval model was trained with DPR repository as mentioned in the paper.
You need to index your dataset using this script, https://github.com/facebookresearch/DPR/blob/main/generate_dense_embeddings.py.
Finally you can use this script to retrive content,
https://github.com/facebookresearch/DPR/blob/main/dense_retriever.py
Hope this helps.
Please let us know if you have any further queries. We have sent an email to the author of the DPR repository to have more clarification on the license of the DPR codebase. Hopefully we can release our codebase soon.
Thank you so much!
Hi, Thanks for your work! I am trying to use the pre-trained code retrieval model you have released but don't know how, could you please provide a demo of loading the checkpoint and running the model on an arbitrary code-code retrieval task? Much appreciated!