kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
803 stars 193 forks source link

how to add special tokens #63

Closed Rababalkhalifa closed 4 years ago

Rababalkhalifa commented 4 years ago

Hello, I want to add special tokens to my bert model ? for example [NUMBER] or [URL] ? how can I do this?

https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1

FullTokenizer = bert.bert_tokenization.FullTokenizer bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",trainable=False) vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy() do_lower_case = bert_layer.resolved_object.do_lower_case.numpy() tokenizer = FullTokenizer(vocab_file,do_lower_case)

I can see many unused in side vocab_file but how to utilize them?

kpe commented 4 years ago

There is a extra_tokens_vocab_size which allows for passing extra/custom tokens as negative token ids.

codefish1990 commented 3 years ago

how to add [CLS] and [SEP] at begin and end ?