microsoft / CodeBERT

CodeBERT
MIT License
2.08k stars 431 forks source link

Questions for different modes of UniXcoder #233

Open rongqipan opened 1 year ago

rongqipan commented 1 year ago

Hi,

Thanks for your good work,

For UniXcoder, the example that measures the similarity between NL-PL pairs you showed is using encoder-only mode.

Is it possible that encoder-decoder mode can achieve better results?

My task is to extract embeddings (function embedding) and measure the similarity between them, which mode do you suggest to use?

Thanks a lot, Rongqi

guoday commented 1 year ago

In pre-training, we use encoder-only mode for contrastive learning, so encoder-decoder mode will perform worse.

I suggest you use encoder-only mode to measure the similarity, since encoder-decoder and decoder-only mode is for generation.

rongqipan commented 1 year ago

In pre-training, we use encoder-only mode for contrastive learning, so encoder-decoder mode will perform worse.

I suggest you use encoder-only mode to measure the similarity, since encoder-decoder and decoder-only mode is for generation.

Thanks for your reply,

There are two tasks for learning the semantic embedding, so contrastive learning is only for encoder-only mode, how about the cross-modal generation task?

guoday commented 1 year ago

cross-modal generation task is encoder-decoder mode. However, for code-to-text generation, the code fragment is encoded by encoder-only model and generate text using encoder-decoder mode.

rongqipan commented 1 year ago

cross-modal generation task is encoder-decoder mode. However, for code-to-text generation, the code fragment is encoded by encoder-only model and generate text using encoder-decoder mode.

Sorry I am a bit confused, so will cross-modal generation also help the encoder-only mode to learn the semantic embedding?

guoday commented 1 year ago

Yes.