sentence 中字匹配到词的mask问题

liuwei1206 / LEBERT

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

336 stars 60 forks source link

sentence 中字匹配到词的mask问题 #20

Closed seyoulala closed 3 years ago

seyoulala commented 3 years ago

for idy in range(sent_length): now_words = matched_words[idy] now_word_ids = self.word_vocab.convert_items_to_ids(now_words) matched_word_ids[idy][:len(now_word_ids)] = now_word_ids matched_word_mask[idy][:len(matched_word_ids)] = 1

这里 matched_word_ids 的长度为max_seq_len，matched_word_mask[idy] 的长度为 max_num_word , max_seq_len >max_num_word,这样不管当前字是否匹配到词，pad位置在计算attention的时候不都是参与计算么？

liuwei1206 commented 3 years ago

Hi,

The length of matched_word_ids is not always max_seq_len but is dynamic. It is the number of matched words from lexicon but not larger than max_seq_length. So if the mask = 0, the value will not be calculated by the attention.

seyoulala commented 3 years ago

我看了代码 max_seq_length是一个固定的最大长度,也跑了代码发现没有匹配到词的mask也是1。matched_word_mask[idy][:len(matched_word_ids)] = 1 换成 matched_word_mask[idy][:len(now_word_ids)] = 1 感觉才是正确的

liuwei1206 commented 3 years ago

Hi,

Yes, you are right. The code should be written to matched_word_mask[idy][:len(now_word_ids)] = 1. This is not the original code, so it may have some errors during the rewrite(for simplifaction).

Really thanks for your correction!