yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
211 stars 36 forks source link

RuntimeError: CUDA error: device-side assert triggered on criterion #43

Open junylee11 opened 8 months ago

junylee11 commented 8 months ago

I saw issues about this error. #28 But, I don't know how to solve this error..

I don't know how to write a code that skips the error. Can you tell me the solution?

Error occured on this code `

accelerator.print('Start training...')

running_loss = 0

for _, batch in enumerate(train_loader):        
    curr_steps += 1

    words, labels, phonemes, input_lengths, masked_indices = batch
    text_mask = length_to_mask(torch.Tensor(input_lengths))# .to(device)

    tokens_pred, words_pred = bert(phonemes, attention_mask=(~text_mask).int())

    loss_vocab = 0
    for _s2s_pred, _text_input, _text_length, _masked_indices in zip(words_pred, words, input_lengths, masked_indices):
        loss_vocab += criterion(_s2s_pred[:_text_length], _text_input[:_text_length]) # Here!!
    loss_vocab /= words.size(0)

`

C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < nclasses failed. Traceback (most recent call last): File "C:\Users\user\Desktop\PL-BERT-KO\train_infer.py", line 198, in notebook_launcher(train, args=(), numprocesses=1) File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\accelerate\launchers.py", line 207, in notebooklauncher function(*args) File "C:\Users\user\Desktop\PL-BERT-KO\train_infer.py", line 147, in train loss_vocab += criterion(_s2s_pred[:_text_length], _text_input[:text_length]) File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\module.py", line 1518, in wrapped_call_impl return self.call_impl(*args, *kwargs) File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\module.py", line 1527, in call_impl return forward_call(args, **kwargs) File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\loss.py", line 1179, in forward return F.cross_entropy(input, target, weight=self.weight, File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\functional.py", line 3053, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.