microsoft / CodeBERT

CodeBERT
MIT License
2.23k stars 453 forks source link

eos_token at the end of sentence #188

Closed aravind-gk closed 1 year ago

aravind-gk commented 1 year ago

According to the paper (Section 3.2 - Input Output Representations), "eos_token" is used to mark the end of text and code contents, and "sep_token" is used to separate the text tokens from the code tokens. Corrected and updated the same in the code sample present in README. Although both "sep_token" and "eos_token" are "\" tokens.