rz-zhang / SeqMix

The repository for our EMNLP'20 paper SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup.
43 stars 6 forks source link

A question about token table (W) construction #1

Open seongminp opened 3 years ago

seongminp commented 3 years ago

Hello. Thank you for publishing your research. While reading your SeqMix paper, I had trouble understanding the method you used to build your {word, embedding} table (as discussed in Section 3.2 and Appendix C.3).

It is mentioned that you construct W, a map of each token (w) to its contextual embedding (e). How exactly are these contextual embeddings obtained? The paper mentions they were extracted using BERT. When constructing table W, did you:

  1. feed [cls] token [sep] to BERT for every token in the vocab list?
  2. or just feed [cls] token1 token2 ... token_n [sep] to BERT? (Should be the same with 1. if only word embeddings are acquired).
  3. or just feed the training examples directly to BERT? (I guess this does not work because multiple embeddings will be produced for a single token, and not all tokens in the vocab list will be covered)

I guess there are many ways to go about this. I am trying to implement your method to evaluate ASR results. I would appreciate any guidance.

Thank you.

rz-zhang commented 3 years ago

Hi, thanks for your question. Yes, there should be many ways to go about this and my implementation is as follows.

def get_word_embedding(model_dir=None):
  model = Ner.from_pretrained(model_dir)
  tokenizer = BertTokenizer.from_pretrained(model_dir, do_lower_case=args.do_lower_case)
  for name, parameters in model.named_parameters():
    if name=='bert.embeddings.word_embeddings.weight':
      bert_embedding = parameters.detach().cpu().numpy()

  wordidx2ebd = {idx:bert_embedding[idx] for idx in range(bert_embedding.shape[0])}
  ebd2wordidx = {}
  for k,v in wordidx2ebd.items():
    ebd2wordidx[tuple(v)] = k

  return wordidx2ebd, ebd2wordidx
seongminp commented 3 years ago

Thank you for your response!

One more question: did you by any change experiment with contextual embeddings (word embedding + positional + segment embedding) instead of pure word embeddings?

Thank you.