microsoft / CodeBERT

CodeBERT
MIT License
2.19k stars 450 forks source link

Token embedding with CodeBER, UniXcoder or LongCoder #293

Open ramsey-coding opened 1 year ago

ramsey-coding commented 1 year ago

I would like to get token embedding for python and java code.

I would be curious to know what the authors think about this?

Will the tokens for python code tokens will be meaning full?

guoday commented 10 months ago

Because CodeBERT and UniXcoder are pretrained on the MLM (Masked Language Model) objective, they are meaning full to some extent. Specifically, it still depends on the tasks you need to accomplish.