salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.68k stars 394 forks source link

how can i output embedding for code #48

Closed XiaoXiaoYi123 closed 2 years ago

yuewang-cuhk commented 2 years ago

Hi, currently we do not test which is the best way for code embedding. We would suggest that you can directly employ the last decoder hidden state or the max/avg pool of all decoder states as the code embedding.

trivikramak commented 2 years ago

Hi, Is there any update on evaluating good embeddings of code? Can you suggest the best possible embedding that could be used to cluster a large number of code-snippets to identify common defects among them?