Traceback (most recent call last):
File "train.py", line 213, in loss3 = sparse_multilabel_categorical_crossentropy(y_true=batch_tail_labels, y_pred=logits3, mask_zero=True)
File "GPLinker_torch/nets/gpNet.py", line 40, in sparse_multilabel_categorical_crossentropy
loss = torch.mean(torch.sum(pos_loss + neg_loss))RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
首先感谢大佬分享代码!
CMeIE数据集能跑通了,后来换了一个自己的数据集,出现如下报错:
Traceback (most recent call last): File "train.py", line 213, in
loss3 = sparse_multilabel_categorical_crossentropy(y_true=batch_tail_labels, y_pred=logits3, mask_zero=True)
File "GPLinker_torch/nets/gpNet.py", line 40, in sparse_multilabel_categorical_crossentropy
loss = torch.mean(torch.sum(pos_loss + neg_loss))
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [1,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.请问可能是什么原因呢?不知道有没有人遇到类似问题。看报错可能是loss计算中的原因。输出了一下,能跑几个batch,然后有一个batch突然报这个错,报错这个batch一直到计算loss前的一行代码
logits1, logits2, logits3 = net(batch_token_ids, batch_mask_ids, batch_token_type_ids)
这里都是正常运行。网上还有说这种是索引越界造成的,但我看logits1维度(batch_size, 2, sequence_length, sequence_length),logits2维度(batch_size, len(schema), sequence_length, sequence_length),logits3维度(batch_size, len(schema), sequence_length, sequence_length)都是正常的。