Open seongminp opened 3 years ago
Hi, thanks for your question. Yes, there should be many ways to go about this and my implementation is as follows.
def get_word_embedding(model_dir=None):
model = Ner.from_pretrained(model_dir)
tokenizer = BertTokenizer.from_pretrained(model_dir, do_lower_case=args.do_lower_case)
for name, parameters in model.named_parameters():
if name=='bert.embeddings.word_embeddings.weight':
bert_embedding = parameters.detach().cpu().numpy()
wordidx2ebd = {idx:bert_embedding[idx] for idx in range(bert_embedding.shape[0])}
ebd2wordidx = {}
for k,v in wordidx2ebd.items():
ebd2wordidx[tuple(v)] = k
return wordidx2ebd, ebd2wordidx
Thank you for your response!
One more question: did you by any change experiment with contextual embeddings (word embedding + positional + segment embedding) instead of pure word embeddings?
Thank you.
Hello. Thank you for publishing your research. While reading your SeqMix paper, I had trouble understanding the method you used to build your {word, embedding} table (as discussed in Section 3.2 and Appendix C.3).
It is mentioned that you construct W, a map of each token (w) to its contextual embedding (e). How exactly are these contextual embeddings obtained? The paper mentions they were extracted using BERT. When constructing table W, did you:
[cls] token [sep]
to BERT for every token in the vocab list?[cls] token1 token2 ... token_n [sep]
to BERT? (Should be the same with 1. if only word embeddings are acquired).I guess there are many ways to go about this. I am trying to implement your method to evaluate ASR results. I would appreciate any guidance.
Thank you.