+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.

shenxuhui commented 5 years ago

Originally posted by @jiesutd in https://github.com/jiesutd/NCRFpp/issues/135#issuecomment-524819427

shenxuhui commented 5 years ago

+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.

I think what you said is the code on the bottom.

But the comment of +2 in SeqLabel is "add two more label for downlayer lstm, use original label size for CRF". The code in Seqlabel adds 2 for lstm not for crf.

I'm sorry to bother you again, but I really like this project from which I learn a lot.

class CRF(nn.Module):

    def __init__(self, tagset_size, gpu):
        super(CRF, self).__init__()
        print("build CRF...")
        self.gpu = gpu
        # Matrix of transition parameters.  Entry i,j is the score of transitioning from i to j.
        self.tagset_size = tagset_size
        **# # We add 2 here, because of START_TAG and STOP_TAG**
        # # transitions (f_tag_size, t_tag_size), transition value from f_tag to t_tag
        init_transitions = torch.zeros(self.tagset_size+2, self.tagset_size+2)
        init_transitions[:,START_TAG] = -10000.0
        init_transitions[STOP_TAG,:] = -10000.0
        init_transitions[:,0] = -10000.0
        init_transitions[0,:] = -10000.0
        if self.gpu:
            init_transitions = init_transitions.cuda()
        self.transitions = nn.Parameter(init_transitions)

        # self.transitions = nn.Parameter(torch.Tensor(self.tagset_size+2, self.tagset_size+2))
        # self.transitions.data.zero_()

shenxuhui commented 5 years ago

The +2 code in Seqlable:

class SeqLabel(nn.Module):
    def __init__(self, data):
        super(SeqLabel, self).__init__()
        self.data2=data
        self.use_crf = data.use_crf
        print("build sequence labeling network...")
        print("use_char: ", data.use_char)
        if data.use_char:
            print("char feature extractor: ", data.char_feature_extractor)
        print("word feature extractor: ", data.word_feature_extractor)
        print("use crf: ", self.use_crf)

        self.gpu = data.HP_gpu
        self.average_batch = data.average_batch_loss
        ## add two more label for downlayer lstm, use original label size for CRF , Why do that???
        label_size = data.label_alphabet_size
        **data.label_alphabet_size += 2**
        self.word_hidden = WordSequence(data)
        if self.use_crf:
            self.crf = CRF(label_size, self.gpu)

jiesutd commented 5 years ago

+2 is applied to the LSTM output (i.e. add two more dimensions to the LSTM output vector)
The two added dimensions serve as the START_TAG and STOP_TAG in the CRF layer.

So the +2 is applied on the LSTM layer but the goal is to use them in CRF layer

jiesutd / NCRFpp

+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small. #136