nitishgupta / nmn-drop

Neural Module Network for Reasoning over Text, ICLR 2020
122 stars 14 forks source link

Question about extracting Q and P representations #5

Closed Borororo closed 4 years ago

Borororo commented 4 years ago

Hi, thanks for sharing the codes! When I was reading through your codes, and I noticed that

    # Skip [CLS]; then the next max_ques_len tokens are question tokens        
    encoded_question = bert_out[:, 1 : self.max_ques_len + 1, :]

    question_mask = (pad_mask[:, 1 : self.max_ques_len + 1]).float()

    # Skip [CLS] Q_tokens [SEP]

    encoded_passage = bert_out[:, 1 + self.max_ques_len + 1 :, :]

    passage_mask = (pad_mask[:, 1 + self.max_ques_len + 1 :]).float()

    passage_token_idxs = question_passage_tokens[:, 1 + self.max_ques_len + 1 :]

I know it is much more convenient for us to separate Q and P representations for all instances if we pad all questions to a fixed length(max_ques_len). However, can we separate Q and P dynamically depending on the real length of each instance? Is there any limitation here?

nitishgupta commented 4 years ago

Ideally, yes.

While creating instances you could just keep [CLS] question [SEP] paragraph [SEP] and keep track of the question length. In the forward function of the model you could extract these question representations based on their lengths. This would still require you to PAD questions till the longest question in this batch.

In my opinion, this increases code complexity without a significant improvement in runtime. Also opens up possibilities for potential bugs. Hence I didn't try doing this.