kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
935 stars 151 forks source link

Transition parameters do not update #102

Closed lopozz closed 1 year ago

lopozz commented 2 years ago

Hi, @kmkurn I had a similar problem that @vijay120 had. I also have this issue and the start, end, and transition scores do not update.


for epoch_i in range(epochs):  
    model.train()
    for i, v in train_data.items():
      model.zero_grad()
      optimizer.zero_grad()

      # Step 2. Run the forward pass.
      train_input_ids = train_data[i]['input_ids'].to(device)                    # (batch, max_seq_length)
      train_input_mask = train_data[i]['attention_mask'].to(device)              # (batch, max_seq_length)
      train_input_type = train_data[i]['token_type_ids'].to(device)              # (batch, max_seq_length)
      token_train_label = train_labels['Tags'][i].to(device)                     # (batch, max_seq_length)

      scores = model.forwardCRF(train_input_ids.long(), 
                                    attention_mask = train_input_mask.long(), 
                                    token_type_ids = train_input_type.long())     # (batch, max_seq_length, 2)

      log_lik = -model.crf(scores.long(), token_train_label.long(), train_input_mask.bool(), reduction='mean')
      prediction = model.crf.decode(scores.long(), train_input_mask.bool())

      log_lik.backward()
      optimizer.step()```

@kmkurn Could you pls help me with this issue? Am I missing something fundamental?
lopozz commented 2 years ago

This is the model @kmkurn . It is a stack of BERT-LSTM-CRF:


def init(self, config, doCRF = 'off'):
super(IndexerBert, self).init()
self.num_labels = config.num_labels
self.LSTM_hidden_dim = 250
self.bert = BertModel.from_pretrained("bert-base-uncased")
self.dropout = nn.Dropout(p=0.5)
self.lstm = nn.LSTM(input_size = config.hidden_size,
hidden_size = self.LSTM_hidden_dim,
batch_first=True,
bidirectional=True,
dropout=0.05)

  self.crf = CRF(self.num_labels, batch_first=True)
  self.classifier2 = nn.Linear(self.LSTM_hidden_dim*2, 2)

  # Freeze BERT parameters
  for name, param in list(self.bert.named_parameters())[:-50]: 
    param.requires_grad = False

def forwardCRF(self, input_ids, attention_mask = None, token_type_ids = None):
  output, pooled_output = self.bert(input_ids, 
                                    attention_mask, 
                                    token_type_ids,
                                    return_dict=False)

  dropout_output = self.dropout(output)
  input, (h_0, c_0) = self.lstm(dropout_output)                              # (batch, seq_len, 500)

  linear_output = self.classifier2(input)                                    # (batch, seq_len, 2)

  return linear_output       ```
lopozz commented 2 years ago

Also, the prediction looks like this [1, 0, 1, 0, 1, 0, 1, 0, ...]. So just a sequence of alternated 1s and 0s. I don't know if it could help you to better understand my problem

kmkurn commented 1 year ago

Hi, thanks for using the library. One potential issue is that you convert the scores to long for the CRF. The scores are supposed to be floats. Can you try replacing scores.long() with just scores when calling both model.crf() and model.crf.decode()?