jiesutd / NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Apache License 2.0
1.89k stars 446 forks source link

add two more label for downlayer lstm, use original label size for CRF #135

Closed shenxuhui closed 5 years ago

shenxuhui commented 5 years ago

I confused about the operation: data.label_alphabet_size += 2 in seqlabel.py, Why add two more label for downlayer lstm ?

thanks for your work.

shenxuhui commented 5 years ago

the source code:


class SeqLabel(nn.Module):
    def __init__(self, data):
        super(SeqLabel, self).__init__()
        self.data2=data
        self.use_crf = data.use_crf
        print("build sequence labeling network...")
        print("use_char: ", data.use_char)
        if data.use_char:
            print("char feature extractor: ", data.char_feature_extractor)
        print("word feature extractor: ", data.word_feature_extractor)
        print("use crf: ", self.use_crf)

        self.gpu = data.HP_gpu
        self.average_batch = data.average_batch_loss
        ## add two more label for downlayer lstm, use original label size for CRF , Why do that???
        label_size = data.label_alphabet_size
        data.label_alphabet_size += 2
        self.word_hidden = WordSequence(data)
        if self.use_crf:
            self.crf = CRF(label_size, self.gpu)```
shenxuhui commented 5 years ago

I change “label_alphabet_size += 2” to “label_alphabet_size += 0”, which have little effect on the result based on demo data. label_alphabet_size += 2“ is a little better than “label_alphabet_size += 0”.

I am a newbie nlp student, somebody can help me.

jiesutd commented 5 years ago

+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.

shenxuhui commented 5 years ago

+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.

I think what you said is this. The comment of +2 in SeqLabel is "add two more label for downlayer lstm, use original label size for CRF". The code adds 2 for lstm not for crf.

I'm sorry to bother you again, but I really like this project from which I learn a lot.

class CRF(nn.Module):

    def __init__(self, tagset_size, gpu):
        super(CRF, self).__init__()
        print("build CRF...")
        self.gpu = gpu
        # Matrix of transition parameters.  Entry i,j is the score of transitioning from i to j.
        self.tagset_size = tagset_size
        **# # We add 2 here, because of START_TAG and STOP_TAG**
        # # transitions (f_tag_size, t_tag_size), transition value from f_tag to t_tag
        init_transitions = torch.zeros(self.tagset_size+2, self.tagset_size+2)
        init_transitions[:,START_TAG] = -10000.0
        init_transitions[STOP_TAG,:] = -10000.0
        init_transitions[:,0] = -10000.0
        init_transitions[0,:] = -10000.0
        if self.gpu:
            init_transitions = init_transitions.cuda()
        self.transitions = nn.Parameter(init_transitions)

        # self.transitions = nn.Parameter(torch.Tensor(self.tagset_size+2, self.tagset_size+2))
        # self.transitions.data.zero_()
jiesutd commented 5 years ago
  1. +2 is applied to the LSTM output (i.e. add two more dimensions to the LSTM output vector)
  2. The two added dimensions serve as the START_TAG and STOP_TAG in the CRF layer.

So the +2 is applied on the LSTM layer but the goal is to use them in CRF layer