Transition parameters do not update

lopozz commented 2 years ago

Hi, @kmkurn I had a similar problem that @vijay120 had. I also have this issue and the start, end, and transition scores do not update.


for epoch_i in range(epochs):  
    model.train()
    for i, v in train_data.items():
      model.zero_grad()
      optimizer.zero_grad()

      # Step 2. Run the forward pass.
      train_input_ids = train_data[i]['input_ids'].to(device)                    # (batch, max_seq_length)
      train_input_mask = train_data[i]['attention_mask'].to(device)              # (batch, max_seq_length)
      train_input_type = train_data[i]['token_type_ids'].to(device)              # (batch, max_seq_length)
      token_train_label = train_labels['Tags'][i].to(device)                     # (batch, max_seq_length)

      scores = model.forwardCRF(train_input_ids.long(), 
                                    attention_mask = train_input_mask.long(), 
                                    token_type_ids = train_input_type.long())     # (batch, max_seq_length, 2)

      log_lik = -model.crf(scores.long(), token_train_label.long(), train_input_mask.bool(), reduction='mean')
      prediction = model.crf.decode(scores.long(), train_input_mask.bool())

      log_lik.backward()
      optimizer.step()```

@kmkurn Could you pls help me with this issue? Am I missing something fundamental?

lopozz commented 2 years ago

This is the model @kmkurn . It is a stack of BERT-LSTM-CRF:


def init(self, config, doCRF = 'off'):
super(IndexerBert, self).init()
self.num_labels = config.num_labels
self.LSTM_hidden_dim = 250
self.bert = BertModel.from_pretrained("bert-base-uncased")
self.dropout = nn.Dropout(p=0.5)
self.lstm = nn.LSTM(input_size = config.hidden_size,
hidden_size = self.LSTM_hidden_dim,
batch_first=True,
bidirectional=True,
dropout=0.05)

  self.crf = CRF(self.num_labels, batch_first=True)
  self.classifier2 = nn.Linear(self.LSTM_hidden_dim*2, 2)

  # Freeze BERT parameters
  for name, param in list(self.bert.named_parameters())[:-50]: 
    param.requires_grad = False

def forwardCRF(self, input_ids, attention_mask = None, token_type_ids = None):
  output, pooled_output = self.bert(input_ids, 
                                    attention_mask, 
                                    token_type_ids,
                                    return_dict=False)

  dropout_output = self.dropout(output)
  input, (h_0, c_0) = self.lstm(dropout_output)                              # (batch, seq_len, 500)

  linear_output = self.classifier2(input)                                    # (batch, seq_len, 2)

  return linear_output       ```

lopozz commented 2 years ago

Also, the prediction looks like this [1, 0, 1, 0, 1, 0, 1, 0, ...]. So just a sequence of alternated 1s and 0s. I don't know if it could help you to better understand my problem

kmkurn commented 1 year ago

Hi, thanks for using the library. One potential issue is that you convert the scores to long for the CRF. The scores are supposed to be floats. Can you try replacing scores.long() with just scores when calling both model.crf() and model.crf.decode()?

kmkurn / pytorch-crf

Transition parameters do not update #102