Closed kgarg8 closed 2 years ago
Hi,
Thanks for the nice repo.
I am facing the following error while training the model with kp20k dataset. FYI, I am training with batch_size=2.
batch_size=2
08/30/2021 23:41:03 [INFO] train_ml: Epoch 1; batch: 90000; total batch: 90000,avg training ppl: 5.333, loss: 1.674 08/30/2021 23:43:40 [INFO] train_ml: Epoch 1; batch: 91000; total batch: 91000,avg training ppl: 5.328, loss: 1.673 08/30/2021 23:46:18 [INFO] train_ml: Epoch 1; batch: 92000; total batch: 92000,avg training ppl: 5.322, loss: 1.672 /pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [148,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [148,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ... /pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [130,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [130,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed. Traceback (most recent call last): File "train.py", line 103, in <module> main(opt) File "train.py", line 85, in main train_ml.train_model(model, optimizer, train_data_loader, valid_data_loader, opt) File "/home/ubuntu/kg_one2set/train_ml.py", line 44, in train_model batch_loss_stat = train_one_batch(batch, model, optimizer, opt) File "/home/ubuntu/kg_one2set/train_ml.py", line 146, in train_one_batch control_embed = model.decoder.forward_seg(state) File "/home/ubuntu/kg_one2set/pykp/decoder/transformer.py", line 153, in forward_seg control_idx = torch.arange(0, self.max_kp_num).long().to(device).reshape(1, -1).repeat(batch_size, 1) RuntimeError: CUDA error: device-side assert triggered
Any suggestions would be appreciated.
Hi,
Thanks for the nice repo.
I am facing the following error while training the model with kp20k dataset. FYI, I am training with
batch_size=2
.Any suggestions would be appreciated.