kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
935 stars 151 forks source link

Cuda error: device-side assert triggered while adding BiLSTM-CRF to RoBERTa model #103

Closed chizhikchi closed 1 year ago

chizhikchi commented 2 years ago

Hi!

Firstly, thank you for publishing the code, can't wait to make it work for me!

I'm trying to add a BiLSTM-CRF layer on top of a pre-trained RoBERTa model to perform token classification with 3 labels ["O", "B", "I"]. I define the model as follows:

class RoBERTa_BiLSTM_CRF(RobertaPreTrainedModel):

    def __init__(self, config): 
        super().__init__(config)

        self.num_labels = config.num_labels

        self.roberta = RobertaModel(config, add_pooling_layer=False)
        self.dropout = nn.Dropout(config.hidden_dropout_prob) 
        self.bilstm = nn.LSTM(config.hidden_size, (config.hidden_size) // 2, num_layers=1, bidirectional=True, batch_first=True) 
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
        self.crf = CRF(num_tags=config.num_labels, batch_first=True)

    def forward(self, input_ids=None, attention_mask=None, labels=None):
        #Extract outputs from the body
        outputs = self.roberta(input_ids, attention_mask=attention_mask)
        sequence_output = outputs[0]
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

        lstm_output, _ = self.bilstm(sequence_output)
        logits = self.classifier(lstm_output)
        loss = None
        labels = labels.to(device, dtype=torch.int64)
        if labels is not None:
            log_likelihood = self.crf(logits, labels, mask=attention_mask.byte())
            loss = 0 - log_likelihood
        return TokenClassifierOutput(loss=loss,
                                        logits=logits, 
                                        hidden_states=outputs.hidden_states,
                                        attentions=outputs.attentions)

I configured tokenizer the way that it doesn't return special tokens, double-checked the dimentions of outputs of LSTM layer and still can't manage to train my model on GPU:

Traceback (most recent call last):
  File "/mnt/beegfs/mc000051/CERPLES/BERT-LSTM/ner.py", line 128, in <module>
    t.train()
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/transformers/trainer.py", line 1317, in train
    return inner_training_loop(
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/transformers/trainer.py", line 1554, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/transformers/trainer.py", line 2183, in training_step
    loss = self.compute_loss(model, inputs)
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/transformers/trainer.py", line 2215, in compute_loss
    outputs = model(**inputs)
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/beegfs/mc000051/CERPLES/BERT-LSTM/modelling.py", line 44, in forward
    log_likelihood = self.crf(logits, labels.long(), mask=attention_mask.byte())
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/torchcrf/__init__.py", line 102, in forward
    numerator = self._compute_score(emissions, tags, mask)
  File "/mnt/beegfs/mc000051/.conda/envs/chizhik/lib/python3.9/site-packages/torchcrf/__init__.py", line 192, in _compute_score
    score += self.transitions[tags[i - 1], tags[i]] * mask[i]
RuntimeError: CUDA error: device-side assert triggered

pytoch Version: 1.9.0 transformers Version: 4.19.2 CUDA Version: 11.4

Thank you in advance for your suggestions

kmkurn commented 2 years ago

Hi, thanks for using the library. This error usually happens when there is a device mismatch between self.transitions, tags, or mask. Can you confirm that these variables are all in the same device prior to calling self.crf()? I suspect that this line in your code:

labels = labels.to(device, dtype=torch.int64)

is the culprit, as the dtypes of cuda tensors are under torch.cuda.*. If I'm right then changing this line to labels.long().to(device) should make it work.

chizhikchi commented 2 years ago

Thank you for your reply!

In addition to what you are pointing at, the problem was that I didn't change the format on HuggingFace Dataset, which, by default, returns python object when getitem is called. Nevertheless, self.crf requires tensors. This can be corrected by calling

datasets.Dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label_ids'])

Just as a closing remark for all people who came across this discussion after facing Cuda error: device-side assert triggered, Huggingface now has many instruments, like data collators and some tokenizer functions that add special tokens and you have to go super carefully to maintain all elements in "labels" tensor in [0, num_labels-1]

kmkurn commented 1 year ago

Glad to hear you've resolved the problem!