saahiluppal / catr

Image Captioning Using Transformer
Apache License 2.0
254 stars 53 forks source link

RuntimeError: CUDA error: device-side assert triggered #19

Open srv-sh opened 3 years ago

srv-sh commented 3 years ago

I got an error while trying to change the Bert Base pre-trained library. I have tried to run this model in another language.

the error is like -

` Initializing Device: cuda

Number of params: 83972666 Train: 18308 Valid: 1830 /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) Start Training.. Epoch: 0 0% 0/1144 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:2204: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True or padding='longest' to pad to the longest sequence in the batch, or use padding='max_length' to pad to a max length. In this case, you can give a specific length with max_length (e.g. max_length=45) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). FutureWarning, /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [32,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [26,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed. 0% 0/1144 [00:08<?, ?it/s] Traceback (most recent call last): File "main.py", line 98, in main(config) File "main.py", line 74, in main model, criterion, data_loader_train, optimizer, device, epoch, config.clip_max_norm) File "/content/gdrive/My Drive/image captioning research work/image captioning/engine.py", line 25, in train_one_epoch outputs = model(samples, caps[:, :-1], cap_masks[:, :-1]) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/content/gdrive/My Drive/image captioning research work/image captioning/models/caption.py", line 29, in forward pos[-1], target, target_mask) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/content/gdrive/My Drive/image captioning research work/image captioning/models/transformer.py", line 48, in forward tgt = self.embeddings(tgt).permute(1, 0, 2) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/content/gdrive/My Drive/image captioning research work/image captioning/models/transformer.py", line 293, in forward input_embeds = self.word_embeddings(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 160, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2043, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered` Can you tell me where I should change in the code to fix this error?

saahiluppal commented 3 years ago

Error seems to be RuntimeError: CUDA error: device-side assert triggered. Here is the stackoverflow thread Seems like the number of labels are exceeding than output defined. Either you can change the vocab size parameter wherever required or you can use the same tokenizer as used in pre-training.