salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.74k stars 401 forks source link

How to get the embedding of code snippets with CodeT5+ ? #120

Open HMJW opened 1 year ago

pai4451 commented 1 year ago

Same question here. How to get code embedding and evaluate like task in paper Table 6?

yuewang-cuhk commented 1 year ago

Hi both, we plan to release our CodeT5+ embedding models in the near future. Pls stay tuned:)

yuewang-cuhk commented 1 year ago

Hi both, we have released the embedding model of CodeT5+ 110M here, which achieves very competitive performance on multiple text-to-code retrieval tasks. The CodeT5+ 110M embedding model can extract 256-dimensional embeddings for both code and text.

YaserAlOsh commented 1 year ago

Impressive. Thank you for the update! Kind regards, Yaser.

From: WANG @.> Sent: Tuesday, July 18, 2023 4:59 PM To: @.> Cc: @.>; @.> Subject: Re: [salesforce/CodeT5] How to get the embedding of code snippets with CodeT5+ ? (Issue #120)

Hi both, we have released the embedding model of CodeT5+ 110M herehttps://github.com/salesforce/CodeT5/tree/main/CodeT5%2B#codet5-embedding-model-, which achieves very competitive performance on multiple text-to-code retrieval tasks. The CodeT5+ 110M embedding modelhttps://huggingface.co/Salesforce/codet5p-110m-embedding can extract 256-dimensional embeddings for both code and text.

— Reply to this email directly, view it on GitHubhttps://github.com/salesforce/CodeT5/issues/120#issuecomment-1640174332, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGCBNA37VCTK7H6ZU4TODCDXQ2CDZANCNFSM6AAAAAAZFXEO5A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

pai4451 commented 1 year ago

Hi both, we have released the embedding model of CodeT5+ 110M here, which achieves very competitive performance on multiple text-to-code retrieval tasks. The CodeT5+ 110M embedding model can extract 256-dimensional embeddings for both code and text.

Thanks for the update, is it possible to use it with SentenceTransformers package?

7291a commented 6 months ago

Hi both, we have released the embedding model of CodeT5+ 110M here, which achieves very competitive performance on multiple text-to-code retrieval tasks. The CodeT5+ 110M embedding model can extract 256-dimensional embeddings for both code and text.

HI, If I want to adapt the codet5p-110m-embedding pre-training model released by CodeT5+ to obtain the code embeddings of my dataset, would I need to fine-tune it?