Closed shenxuhui closed 5 years ago
+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.
I think what you said is the code on the bottom.
But the comment of +2 in SeqLabel is "add two more label for downlayer lstm, use original label size for CRF". The code in Seqlabel adds 2 for lstm not for crf.
I'm sorry to bother you again, but I really like this project from which I learn a lot.
class CRF(nn.Module):
def __init__(self, tagset_size, gpu):
super(CRF, self).__init__()
print("build CRF...")
self.gpu = gpu
# Matrix of transition parameters. Entry i,j is the score of transitioning from i to j.
self.tagset_size = tagset_size
**# # We add 2 here, because of START_TAG and STOP_TAG**
# # transitions (f_tag_size, t_tag_size), transition value from f_tag to t_tag
init_transitions = torch.zeros(self.tagset_size+2, self.tagset_size+2)
init_transitions[:,START_TAG] = -10000.0
init_transitions[STOP_TAG,:] = -10000.0
init_transitions[:,0] = -10000.0
init_transitions[0,:] = -10000.0
if self.gpu:
init_transitions = init_transitions.cuda()
self.transitions = nn.Parameter(init_transitions)
# self.transitions = nn.Parameter(torch.Tensor(self.tagset_size+2, self.tagset_size+2))
# self.transitions.data.zero_()
The +2 code in Seqlable:
class SeqLabel(nn.Module):
def __init__(self, data):
super(SeqLabel, self).__init__()
self.data2=data
self.use_crf = data.use_crf
print("build sequence labeling network...")
print("use_char: ", data.use_char)
if data.use_char:
print("char feature extractor: ", data.char_feature_extractor)
print("word feature extractor: ", data.word_feature_extractor)
print("use crf: ", self.use_crf)
self.gpu = data.HP_gpu
self.average_batch = data.average_batch_loss
## add two more label for downlayer lstm, use original label size for CRF , Why do that???
label_size = data.label_alphabet_size
**data.label_alphabet_size += 2**
self.word_hidden = WordSequence(data)
if self.use_crf:
self.crf = CRF(label_size, self.gpu)
So the +2 is applied on the LSTM layer but the goal is to use them in CRF layer
+2 is used for the CRF later which needs start and end padding. The results in demo data doesn’t mean anything as the data is too small.
Originally posted by @jiesutd in https://github.com/jiesutd/NCRFpp/issues/135#issuecomment-524819427