microsoft / IRNet

An algorithm for cross-domain NL2SQL
MIT License
264 stars 81 forks source link

why do you initialize decoder init state like '[tanh(linear(enc_last_cell_state)), zeros]' #30

Open AoZhang opened 4 years ago

AoZhang commented 4 years ago

I notice that in src/models/model.py you initialize the decoder init state like following:

    def init_decoder_state(self, enc_last_cell):
        h_0 = self.decoder_cell_init(enc_last_cell)
        h_0 = F.tanh(h_0)

        return h_0, Variable(self.new_tensor(h_0.size()).zero_())

It seams like the last cell state of question encoder is used as the initialized hidden state of the decoder after Linear() and tanh(), and a zero tensor is used as the initialized cell state of the decoder.

May I know the reason that you didn't use encoder's last (hidden state, cell state) as the initialization of the decoder's (hidden state, cell state) respectively?

jaydeepb-inexture commented 4 years ago

@AoZhang have you understood the code before this decoder state thoroughly??

AoZhang commented 4 years ago

@AoZhang have you understood the code before this decoder state thoroughly??

before this decoder state, I just found question encoding using a bi-lstm which outputs src_encodings, (last_state, last_cell). And the decoder uses the last_cell to create the init state.

jaydeepb-inexture commented 4 years ago

have you gone through the train.py , and other files?

AoZhang commented 4 years ago

have you gone through the train.py , and other files?

In fact, I've read almost all the codes. And found that the initialization of decoder is not a similar implementation to other seq2seq models.

xiaer1 commented 4 years ago

In my personal understanding, the first step generated by the decoder is the Root1 type, and the first cell state of decoder should not change with the semantics of the question. Of course, I guess, there is no problem using the hidden state directly.