Closed chizhikchi closed 1 year ago
Hi, thanks for using the library. This error usually happens when there is a device mismatch between self.transitions
, tags
, or mask
. Can you confirm that these variables are all in the same device prior to calling self.crf()
? I suspect that this line in your code:
labels = labels.to(device, dtype=torch.int64)
is the culprit, as the dtypes of cuda tensors are under torch.cuda.*
. If I'm right then changing this line to labels.long().to(device)
should make it work.
Thank you for your reply!
In addition to what you are pointing at, the problem was that I didn't change the format on HuggingFace Dataset, which, by default, returns python object when getitem
is called. Nevertheless, self.crf
requires tensors. This can be corrected by calling
datasets.Dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label_ids'])
Just as a closing remark for all people who came across this discussion after facing Cuda error: device-side assert triggered
, Huggingface now has many instruments, like data collators and some tokenizer functions that add special tokens and you have to go super carefully to maintain all elements in "labels" tensor in [0, num_labels-1]
Glad to hear you've resolved the problem!
Hi!
Firstly, thank you for publishing the code, can't wait to make it work for me!
I'm trying to add a BiLSTM-CRF layer on top of a pre-trained RoBERTa model to perform token classification with 3 labels
["O", "B", "I"]
. I define the model as follows:I configured tokenizer the way that it doesn't return special tokens, double-checked the dimentions of outputs of LSTM layer and still can't manage to train my model on GPU:
pytoch Version: 1.9.0 transformers Version: 4.19.2 CUDA Version: 11.4
Thank you in advance for your suggestions