Closed skye95git closed 2 years ago
The paper is here https://arxiv.org/pdf/2003.10555.pdf
The paper is here https://arxiv.org/pdf/2003.10555.pdf
Thanks for your reply. I find the tokenizer used for fine-tuning is --tokenizer_name=microsoft/codebert-base
. Where does this tokenizer come from, that you retrained on the code domain?
The tokenizer comes from roberta-base. We don't re-train the tokenizer.
Hi, I watched a video where Duyu Tang introduced Codebert. The 'replaced token detection' appears to have been inspired by a 2020 paper by Google and Stanford. Duyu did not mention which paper it was. Can you share the title of this paper?
According to Duyu, 'replaced token detection' is meant to take advantage of uncommented code, or only comments without code. Why do you use Discriminator to identify which word is replaced so you can use both types of data?