IndexError? - Githubissues

Hwasin2015 commented 3 years ago

Hello,

I performed one2seq training after the preprocessing, used the 'config-rnn-keyphrase-one2seq-debug.yml'. but I encountered the following error.

...
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [11,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [11,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [11,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/trainer.py", line 397, in _gradient_accumulation
    model=self.model
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 188, in __call__
    loss, stats = self._compute_loss(batch, **shard_state)
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 383, in _compute_loss
    semcov_ending_state=self.semcov_ending_state)
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 486, in _compute_semantic_coverage_loss
    if tgt_sep_idx[i].ne(0).sum() == 0:
RuntimeError: CUDA error: device-side assert triggered
[2021-05-04 03:28:11,106 INFO] At step 1, we removed a batch - accum 0
Traceback (most recent call last):
  File "train.py", line 6, in <module>
    main()
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/bin/train.py", line 274, in main
    train(opt)
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/bin/train.py", line 158, in train
    single_main(opt, 0)
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/train_single.py", line 214, in main
    valid_steps=opt.valid_steps)
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/trainer.py", line 244, in train
    self._accum_batches(train_iter)):
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/trainer.py", line 182, in _accum_batches
    for batch in iterator:
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/inputters/inputter.py", line 1140, in __iter__
    for batch in self._iter_dataset(path):
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/inputters/inputter.py", line 1118, in _iter_dataset
    for batch in cur_iter:
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/inputters/inputter.py", line 993, in __iter__
    self.device)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/batch.py", line 36, in __init__
    setattr(self, name, field.process(batch, device=device))
  File "/content/drive/My Drive/OpenNMT-kpg-release-master/onmt/inputters/text_dataset.py", line 121, in process
    base_data = self.base_field.process(batch_by_feat[0], device=device)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/field.py", line 234, in process
    tensor = self.numericalize(padded, device=device)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/field.py", line 329, in numericalize
    lengths = torch.tensor(lengths, dtype=self.dtype, device=device)
RuntimeError: CUDA error: device-side assert triggered

Then I changed to use CPU for training, but the following error occurred.


Traceback (most recent call last):
  File "/Users/Hwasin/Downloads/OpenNMT-kpg-release-master/onmt/trainer.py", line 389, in _gradient_accumulation
    loss, batch_stats = self.train_loss(
  File "/Users/Hwasin/Downloads/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 188, in __call__
    loss, stats = self._compute_loss(batch, **shard_state)
  File "/Users/Hwasin/Downloads/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 379, in _compute_loss
    semantic_coverage_loss = self._compute_semantic_coverage_loss(model,
  File "/Users/Hwasin/Downloads/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 510, in _compute_semantic_coverage_loss
    input_src_states = src_states.index_select(dim=0, index=input_src_idx)
IndexError: index out of range in self
[2021-05-04 15:22:19,168 INFO] At step 1, we removed a batch - accum 0

Could you tell me how to solve it?

zpp13 commented 3 years ago

Did you solve it?

memray commented 3 years ago

Try disabling orth_reg and sem_cov (set orth_reg: 'false', sem_cov: 'false') since they might not be compatible with later changes

zpp13 commented 3 years ago

it's working,thank you

Hwasin2015 commented 2 years ago

thank you

memray / OpenNMT-kpg-release

IndexError? #37