Closed loilt48 closed 3 years ago
neg_idx = np.random.randint(low=0, high=batch_size-1, size=(n_sep * n_neg))
Can you try the above code and double-check your batch size?
I got the same error ,and "neg_idx = np.random.randint(low=0, high=batch_size-1, size=(n_sep * n_neg))" doesn't work.
[2021-01-18 21:54:18,744 INFO] At step 99, we removed a batch - accum 0
Traceback (most recent call last):
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/trainer.py", line 377, in _gradient_accumulation
model=self.model
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 187, in call
loss, stats = self._compute_loss(batch, *shard_state)
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/modules/copy_generator.py", line 264, in _compute_loss
semcov_ending_state=self.semcov_ending_state)
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 457, in _compute_semantic_coverage_loss
neg_idx = np.random.randint(low=0, high=batch_size-1, size=(n_sep n_neg))
File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high
[2021-01-18 21:54:18,802 INFO] At step 100, we removed a batch - accum 0
Traceback (most recent call last):
File "train.py", line 215, in
What can i do?
My bad, according to the official doc of numpy, the argument high should be larger than low. So change high to batch_size (rather than batch_size-1) should resolve it. I guess the error occurs when the batch size happens to be 1, which is rare.
neg_idx = np.random.randint(low=0, high=batch_size, size=(n_sep * n_neg))
Thanks,I solved it. But i obtain a new Error:
/pytorch/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize
failed.
...
/pytorch/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [1,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/trainer.py", line 377, in _gradient_accumulation
model=self.model
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 187, in call
loss, stats = self._compute_loss(batch, **shard_state)
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/modules/copy_generator.py", line 264, in _compute_loss
semcov_ending_state=self.semcov_ending_state)
File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 487, in _compute_semantic_coverage_loss
batch_labels = batch_labels.cuda()
RuntimeError: CUDA error: device-side assert triggered
[2021-01-19 21:22:01,085 INFO] At step 1, we removed a batch - accum 0
Traceback (most recent call last):
File "train.py", line 215, in
I try to deal with it, and I change GPU to CPU to train the model. I got this: Traceback (most recent call last): File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/trainer.py", line 377, in _gradient_accumulation model=self.model File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 187, in call loss, stats = self._compute_loss(batch, **shard_state) File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/modules/copy_generator.py", line 264, in _compute_loss semcov_ending_state=self.semcov_ending_state) File "/root/PycharmProjects/peizhiyan/OpenNMT-kpg-release-master/onmt/utils/loss.py", line 468, in _compute_semantic_coverage_loss input_src_states = src_states.index_select(dim=0, index=input_src_idx) RuntimeError: index out of range: Tried to access index 1 out of table with 0 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418
Can you tell me what should i do?
I updated the code and can you try with the latest commit?
I run python train.py -config config/train/pt_empirical/config-rnn-keyphrase-one2seq-debug.yml
(with orth_reg and sem_cov enabled) and it works well on my end.
I updated the code and can you try with the latest commit? I run
python train.py -config config/train/pt_empirical/config-rnn-keyphrase-one2seq-debug.yml
(with orth_reg and sem_cov enabled) and it works well on my end.
Yes, I have run the code after updated, and i work well. Thanks.
I have a issue when i run this repos, I don't run step training seq2seq , and an issue :+1: File "train.py", line 104, in main single_main(opt, -1) File "/home/lethanhloi/PycharmProjects/keyphrase_project/OpenNMT-kpg-release/onmt/train_single.py", line 165, in main valid_steps=opt.valid_steps) File "/home/lethanhloi/PycharmProjects/keyphrase_project/OpenNMT-kpg-release/onmt/trainer.py", line 265, in train valid_iter, moving_average=self.moving_average) File "/home/lethanhloi/PycharmProjects/keyphraseproject/OpenNMT-kpg-release/onmt/trainer.py", line 323, in validate , batch_stats = self.valid_loss(batch, outputs, attns, model=valid_model) File "/home/lethanhloi/PycharmProjects/keyphrase_project/OpenNMT-kpg-release/onmt/utils/loss.py", line 187, in call loss, stats = self._compute_loss(batch, *shard_state) File "/home/lethanhloi/PycharmProjects/keyphrase_project/OpenNMT-kpg-release/onmt/modules/copy_generator.py", line 266, in _compute_loss semcov_ending_state=self.semcov_ending_state) File "/home/lethanhloi/PycharmProjects/keyphrase_project/OpenNMT-kpg-release/onmt/utils/loss.py", line 461, in _compute_semantic_coverage_loss neg_idx = np.random.randint(0, batch_size-1, size=(n_sep n_neg)) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64 ValueError: low >= high