xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.85k stars 134 forks source link

special tokens in tokenizer #56

Closed erlakshmi123 closed 1 year ago

erlakshmi123 commented 1 year ago

hi,

Is it possible to add special tokens to the tokenizer and retrain for a domain specific task?

hongjin-su commented 1 year ago

Yes, it is possible. You may add special tokens via tokenizer.add_special_tokens(special_tokens) and follow the instruction to train the model.

hongjin-su commented 1 year ago

Please re-open the issue if you have any questions or comments!