neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
325 stars 47 forks source link

Question on Figure 2 in your paper #51

Closed gpengzhi closed 1 year ago

gpengzhi commented 1 year ago

6a73254d1c20528d75da5730ddb28c44

In the intersection matrix, why is the entry pointed out to be one?

Do I miss some technical details?

Thanks a lot!

gpengzhi commented 1 year ago

Another question:

When you apply self-training objective, the alignment matrix A is not the bpe-level intersection 0-1 matrix(?) The alignment matrix A you used is actually the word-level alignment 0-1 matrix converted from the bpe-level intersection matrix. It that correct?

zdou0830 commented 1 year ago

Thanks for the questions!

why is the entry pointed out to be one?

this is because some entries got rounded down to 0 in the visualization but they are actually greater than the threshold (i.e. they are in the range [0.001, 0.005]).

the alignment matrix A

correct. I remembered that I found using word-level alignment is better for most language pairs.