question about class WordLSTMCell

jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.

1.79k stars 457 forks source link

question about class WordLSTMCell #37

Closed gyc913 closed 5 years ago

gyc913 commented 5 years ago

Sorry that I am new to pytorch , but here in the class WordLSTMCell ,I found that

f, i, g = torch.split(wh_b + wi, split_size=self.hidden_size, dim=1)

In the formula of your paper, wh_b and wi are not added , so Did I misunderstand your code?

def forward(self, input, hx): """ Args: input: A (batch, input_size) tensor containing input features. hx: A tuple (h_0, c_0), which contains the initial hidden and cell state, where the size of both states is (batch, hidden_size). Returns: h_1, c_1: Tensors containing the next hidden and cell state. """

    h_0, c_0 = hx
    batch_size = h_0.size(0)
    bias_batch = (self.bias.unsqueeze(0).expand(batch_size, *self.bias.size()))
    wh_b = torch.addmm(bias_batch, h_0, self.weight_hh)  
    wi = torch.mm(input_, self.weight_ih)  
    f, i, g = torch.split(wh_b + wi, split_size=self.hidden_size, dim=1) 
    c_1 = torch.sigmoid(f)*c_0 + torch.sigmoid(i)*torch.tanh(g)
    return c_1

jiesutd commented 5 years ago

They are added together in our paper. Please refer to equation 13.

gyc913 commented 5 years ago

i understand now , i am so sorry

gyc913 commented 5 years ago

sorry,I got another question since the batchsize is set to 1, so there is no padding process, then why the label size is still bigger than the real label size: (Pdb) self.label_alphabet.size() 6
(Pdb) self.label_alphabet.instances ['B-NAME', 'E-NAME', 'O', 'M-NAME', 'S-NAME'] is actually 5

Besides, in class BiLSTM_CRF, why the label size is set as follow? data.label_alphabet_size += 2

Thank you

jiesutd commented 5 years ago

sorry,I got another question since the batchsize is set to 1, so there is no padding process, then why the label size is still bigger than the real label size: (Pdb) self.label_alphabet.size() 6 (Pdb) self.label_alphabet.instances ['B-NAME', 'E-NAME', 'O', 'M-NAME', 'S-NAME'] is actually 5

Besides, in class BiLSTM_CRF, why the label size is set as follow? data.label_alphabet_size += 2

Thank you

The extra label-alphabet is useless, as I use the same alphabet class to represent the word/character and labels. For the word/character, sometimes we need to set an unknown token. So the label also follows the similar format. The 'unknown' label does not affect the results.
For CRF, we need to add extra two label START and END. If you understand the CRF structure, you will know why we need the START and END token during the inference.