Closed Janinanu closed 5 years ago
For token-level embeddings see https://github.com/hanxiao/bert-as-service#getting-elmo-like-contextual-word-embedding instead of attempting to use 1-word "sentences".
Okay, thanks. I now understand that it probably does not make a lot of sense to extract single word embeddings from the Bert model.
I have a list with individual tokens (like a vocab) that I want to extract the Bert embeddings for. I used this command to run the Bert service:
bert-serving-start -model_dir multi_cased_L-12_H-768_A-12/ -pooling_strategy=NONE -max_seq_len=4 -num_worker=3
The crucial part, I assume, is the max_seq_len. If I set max_seq_len=1, it tells me that
3 is an invalid int value must be >3 (account for maximum three special symbols in BERT model) or NONE
However, if I set max_seq_len=None, the service does not even start, and if I set max_seq_len=4, I get an output of dimension vocab_size x 4 x embedding_dimension, even though I want simply an output of dimension vocab_size x embedding_dimension. I am wondering: What is meant by the "three special symbols in BERT model"? Does it refer to the distinction between token embeddings, segment embeddings and position embeddings in the Bert model? In the output that I get, the first 3 vectors for any token are non-zero, the 4th vectors is all zero. Therefore, would the actual word embedding simply be the sum over these three as described in the original Bert paper?