microsoft / CodeBERT

CodeBERT
MIT License
2.23k stars 454 forks source link

can I get embedding for javascript and python code snippet? #154

Closed smith-co closed 2 years ago

smith-co commented 2 years ago

I am looking into the following example to extract code embedding.

code_tokens=tokenizer.tokenize("def max(a,b): if a>b: return a else return b")
tokens=[tokenizer.cls_token]+nl_tokens+[tokenizer.sep_token]+code_tokens+[tokenizer.sep_token]
context_embeddings=model(torch.tensor(tokens_ids)[None,:])[0]

I have the following two questions:

a) Can I use CodeBERT to extract embedding for JavaScript and Python code? b) Can I feed incomplete code JavaScript and Python snippet to extract embedding? Or the code snippet needs to be complete? c) Have anyone used CodeBERT to perform code to code search?

guoday commented 2 years ago

I suggest you can use UniXcoder https://github.com/microsoft/CodeBERT/tree/master/UniXcoder#1-code-and-nl-embeddings. It can extract embeddings for JavaScript and Python code, even for incomplete snippet. For code search, you can follow this https://github.com/microsoft/CodeBERT/tree/master/UniXcoder#2-similarity-between-code-and-nl