microsoft / CodeBERT

CodeBERT
MIT License
2.15k stars 442 forks source link

Recommend way to aggregate semantic code embeddings #249

Closed lazyhope closed 1 year ago

lazyhope commented 1 year ago

What would be the recommend way to aggregate different semantic code embeddings from the same repository to represent the overall semantic of the repository? Currently I am averaging the UniXCoder embeddings and using cosine similarity score to evaluate the result, but I am not sure if it is the right way.

Thanks in advance!

guoday commented 1 year ago

If you lack supervised data to fine-tune UniXcoder, the averaging approach is appropriate.

lazyhope commented 1 year ago

If you lack supervised data to fine-tune UniXcoder, the averaging approach is appropriate.

Thank you!