Use different tokenizer (and specify special tokens)

Thank you for this great repository. It really is a huge help. There is one thing, however, that I cannot figure out on my own: I would like to train an ELECTRA for a different language and therefore use another tokenizer. Unfortunately, I cannot find where I can change the IDs of the special tokens. I trained a BPE-tokenizer with "<s>":0,"<pad>":1,"</s>":2,"<unk>":3,"<mask>":4,..., but the model seems to assume that these special tokens have the ids 100, 101, 102 and 103. Could you please tell me where I can overwrite this assumption? I'm really sorry for the stupid question, but I really could not find it. Thank you very much in advance.

richarddwang / electra_pytorch

Use different tokenizer (and specify special tokens) #33