Setting seq-to-seq model as our pretrained model

b3ade commented 2 years ago

Is it possible to load seq-to-seq model to make word alignments with this work? I'm stuck on getting proper out_src and out_tgt layers to work with for next step. I know its is for mBERT only implementation, but I'm trying to see if is it possible to work on same way on seq-to-seq models. If you have any hint in what direction I need to go or code to share, please do. This is not paper work, I'm just curious.

zdou0830 commented 2 years ago

Hi, I didn't try this before but I think our method can be directly applied to the encoder of a seq2seq model. Also, for MT models, we can use both its encoder and decoder to obtain word embeddings in the source and target sides and extract word alignments using our method, and we can train both src2tgt and tgt2src models so that we can obtain alignments in two directions and combine them in some ways (e.g. taking the intersections).

b3ade commented 2 years ago

I see thanks for reply. If I understood correctly, the problem is how to apply it to the encoder of the seq2seq model. More precisely I'm trying to load nllb-200-distilled-600M model , but I cant get out right layers needed for further calculations.

with torch.no_grad():
    print(ids_src.unsqueeze(0))
    test_src=model.generate(ids_src.unsqueeze(0), forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"], max_length=30)
    print(test_src)
    out_src = model(ids_src.unsqueeze(0), output_hidden_states=True)[2][align_layer][0, 1:-1]

Output:

tensor([[ 81955, 248105,    739,   7819,    248,  81955,    835,      2, 256047]])
tensor([[     2, 256047, 119167, 248105,    739,   7819,    248,  81955,    835,
              2]])
Traceback (most recent call last):
  File "insertWordAlignServer.py", line 39, in <module>
    out_src = model(ids_src.unsqueeze(0), output_hidden_states=True)[2][align_layer][0, 1:-1]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1315, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1206, in forward
    decoder_outputs = self.decoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 985, in forward
    raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Probably not correct way to call model, but not sure how to manage it.

zdou0830 commented 2 years ago

Hi, if you are using an MT model like nllb, given a sentence pair (x, y), you can obtain contextualized word embeddings for x and y by:

feeding <x, y> to nllb and using its encoder to get embeddings for x and its decoder to get embeddings for y, or
feeding <y, x> to nllb and using its encoder to get embeddings for y and its decoder to get embeddings for x, or
feeding <x, whatever> and <y, whatever> to nllb and using its encoder to get embeddings for x and y.

Then, you can extract alignments from the embeddings using our methods.

I haven't used nllb before, but the error seems to be that you didn't feed any inputs to the decoder.

jcuenod commented 1 year ago

I have also been trying to do word alignment with seq-2-seq models. @zdou0830 How do you pick the appropriate alignment layer when changing out models?

zdou0830 commented 1 year ago

I have also been trying to do word alignment with seq-2-seq models. @zdou0830 How do you pick the appropriate alignment layer when changing out models?

I think you can do zero-shot evaluation on a dev set (e.g. examples in https://github.com/neulab/awesome-align/tree/master/examples) and see which layer performs the best.

jcuenod commented 1 year ago

Interesting, thanks.

neulab / awesome-align

Setting seq-to-seq model as our pretrained model #49