RuntimeError: copy_if failed to synchronize

attardi commented 4 years ago

When training with the a relatively small Arabian corpus from here (I just made a few simple changes to the reader to skip comments and multi-tokens in the CoNLLU format): http://ufal.mff.cuni.cz/~zeman/soubory/iwpt2020-train-dev.tgz I get this error:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "run.py", line 58, in cmd(args) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call self.train(train.loader) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train arc_scores, rel_scores = self.model(words, feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward feat_embed = self.feat_embed(feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 43, in forward bert = bert[bert_mask].split(bert_lens[mask].tolist()) RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

It works on CPU however.

Thank you for a nice project.

yzhangcs commented 4 years ago

Sorry, I'm not sure why you encountered this error. May be you can have a try to install transformers with version 2.1.1.

attardi commented 4 years ago

I I run the same code after installing transformers 2.10.0: Successfully installed transformers-2.10.0

I get a lot of Assertion errors and then a failure in :

... /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "run.py", line 58, in cmd(args) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call self.train(train.loader) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train arc_scores, rel_scores = self.model(words, feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward feat_embed = self.feat_embed(feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 40, in forward , , bert = self.bert(subwords, attention_mask=bert_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 734, in forward encoder_attention_mask=encoder_extended_attention_mask, File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 407, in forward hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 368, in forward self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 314, in forward hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 216, in forward mixed_query_layer = self.query(hidden_states) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

The sizes for input and weight are: [5, 812, 768] and [768, 768].

attardi commented 4 years ago

Sorry, I'm not sure why you encountered this error. May be you can have a try to install transformers with version 2.1.1.

This was with transformers 2.1.1. The error with 2.10.0 is above..

yzhangcs commented 4 years ago

Could you factorize this line into two or more steps? https://github.com/yzhangcs/parser/blob/c22c4000b2c75d292e2cf9067a11668afb624977/parser/modules/bert.py#L43 This may give you more ideas about bugs

attardi commented 4 years ago

I changed to: tmp_lens = bert_lens[mask].tolist() tmp_mask = bert[bert_mask] bert = tmp_mask.split(tmp_lens)

and I get:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [39,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "run.py", line 58, in cmd(args) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call self.train(train.loader) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train arc_scores, rel_scores = self.model(words, feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward feat_embed = self.feat_embed(feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 43, in forward tmp_lens = bert_lens[mask].tolist() RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

attardi commented 4 years ago

The sizes of bert_lens and mask are: [4, 399] and [4, 399].

yzhangcs commented 4 years ago

Sorry, I have no clues. It's probably because of incompatible pytorch versions, and mine is 1.3,0 or higher. And if this issue does not occur in other treebanks, I think you should check your data preprocessing.

attardi commented 4 years ago

My torch version is '1.4.0' and this occurs also on another machine and other treebanks. With transformers 2.10.00 is even worse: it seems that there is an Assert that fails, hinting that there might a lurking bug, that the assert catches.

attardi commented 4 years ago

I added a print here:

    print('LENS:', bert_lens.size(), mask.size())
    print(bert_lens.cpu(), mask.cpu())                                                  
    tmp_lens = bert_lens[mask]

and a few of them pass, except the last:

LENS: torch.Size([65, 89]) torch.Size([65, 89]) tensor([[1, 1, 2, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0], [1, 1, 3, ..., 1, 1, 1], ..., [1, 1, 3, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0]]) tensor([[ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., True, True, True], ..., [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]]) LENS: torch.Size([4, 399]) torch.Size([4, 399]) /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [43,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.

attardi commented 4 years ago

I get the same error if I move the data to the CPU:

File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 44, in forward cpu_bert = bert_lens.cpu()[mask.cpu()] # DEBUG RuntimeError: CUDA error: device-side assert triggered

yzhangcs commented 4 years ago

Could print the python list of some samples, I will have a test.

yzhangcs commented 4 years ago

It's so weird. 😢

attardi commented 4 years ago

There is something wrong in the device: print('LENS:', bert_lens.size(), bert_mask.size())
cpu_bert_lens = bert_lens.cpu() cpu_mask = mask.cpu()
print(cpu_bert_lens, cpu_mask) print(cpu_bert_lens[cpu_mask])

LENS: torch.Size([65, 89]) torch.Size([65, 173]) tensor([[1, 1, 2, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0], [1, 1, 3, ..., 1, 1, 1], ..., [1, 1, 3, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0]]) tensor([[ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., True, True, True], ..., [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]]) tensor([1, 1, 2, ..., 4, 1, 1]) LENS: torch.Size([4, 399]) torch.Size([4, 673])

... /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [58,0,0], thread: [127,0,0] Assertion srcIndex < srcS electDimSize failed. Traceback (most recent call last): File "run.py", line 58, in cmd(args) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call self.train(train.loader) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train arc_scores, rel_scores = self.model(words, feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward feat_embed = self.feat_embed(feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 44, in forward cpu_bert_lens = bert_lens.cpu() RuntimeError: CUDA error: device-side assert triggered

yzhangcs commented 4 years ago

It seems that the error lurks in parser.utils.field.BertField. I think you should check the generation of BERT inputs step by step to find out this bug.

attardi commented 4 years ago

Which kind of checks shall I do?

yzhangcs commented 4 years ago

Check if some values in bert_lens are invalid, i.e., <=0. And print(feats) as soon as it is created.

attardi commented 4 years ago

I can't even access the data in bert_lens:

print('LENS: negative', any(float(x) < 0 for c in bert_lens for x in c))

RuntimeError: CUDA error: device-side assert triggered

yzhangcs commented 4 years ago

In which line did you print this?

attardi commented 4 years ago

Just before accessing bert_lens[mask], line 43.

Where are the feats generated?

attardi commented 4 years ago

This is the last batch generated in TextDataLoader.iter() before padsequence, before the crash:

BATCH (tensor([ 0, 35, 33, 34, 24, 26, 26, 2, 10, 31, 26, 10, 25, 28, 28, 11, 13, 28, 28, 25, 4, 12, 29, 25, 6, 25, 10, 30, 25, 34, 11, 26, 13, 25, 10, 25, 6, 10, 25, 6, 11, 10, 13, 6, 10, 25, 6, 10, 25, 6, 30, 11, 10, 13, 6, 34, 11, 33, 26, 10, 25, 10, 31, 24, 30, 3, 29, 18, 25, 10, 30, 26, 25, 28, 11, 13, 10, 28, 10, 25, 34, 24, 24, 20, 3, 24, 15, 10, 31, 10, 30, 25, 11, 13, 25, 10, 25, 6, 11, 13, 10, 25, 25, 34, 11, 26, 13, 1, 29, 10, 21, 30, 25, 25, 34, 11, 26, 25, 6, 10, 25, 33, 10, 30, 18, 25, 24, 24, 3, 36, 29, 10, 31, 6, 24, 30, 3, 26, 36, 36, 10, 25, 34, 5, 7, 25, 8, 33, 29, 10, 25, 6, 10, 30, 25, 6, 10, 30, 1, 10, 31, 6, 26, 2, 24, 15, 10, 30, 25, 25, 6, 34, 11, 26, 25, 10, 30, 25, 11, 13, 25, 10, 25, 11, 10, 13, 25, 29, 11, 8, 10, 13, 25, 10, 25, 25, 34, 10, 17, 24, 24, 27, 25, 6, 1, 30, 10, 30, 10, 30, 10, 28, 24, 30, 3, 36, 10, 30, 30, 11, 13, 10, 28, 10, 30, 6, 34, 11, 26, 33, 29, 36, 10, 31, 25, 10, 25, 11, 25, 21, 13, 26, 2, 29, 34, 11, 33, 24, 26, 29, 25, 10, 25, 25, 11, 13, 5, 26, 10, 25, 18, 25, 27, 2, 10, 31, 34, 11, 24, 30, 29, 25, 4, 8, 33, 29, 6, 10, 25, 8, 1, 24, 3, 29, 34, 11, 26, 33, 10, 21, 21, 30, 29, 6, 34, 11, 13, 29, 10, 30, 25, 10, 30, 10, 25, 25, 25, 11, 13, 25, 10, 24, 30, 29, 25, 11, 13, 10, 30, 25, 25, 25, 11, 13, 25, 11, 13, 10, 25, 25, 10, 25, 11, 13, 25, 10, 25, 10, 25, 25, 34, 11, 5, 13, 10, 25, 25, 11, 30, 10, 25, 6, 11, 26, 14, 3, 25, 5, 11, 26, 13, 25, 34, 11, 8, 33, 36, 10, 31, 25, 6, 10, 25, 6, 10, 25, 11, 13, 25, 34, 11, 10, 13, 26, 10, 25, 25, 25, 10, 25, 11, 8, 10, 25, 11, 10, 13, 34, 34]), tensor([ 0, 10, 30, 25, 25, 35, 26, 25, 25, 10, 30, 25, 10, 30, 34, 10, 31, 25, 25, 31, 6, 10, 30, 6, 6, 34, 11, 13, 26, 25, 6, 25, 25, 26, 2, 29, 26, 25, 22, 30, 4, 10, 30, 25, 10, 34, 31, 34, 24, 26, 25, 25, 25, 24, 25, 26, 10, 25, 10, 25, 12, 34, 24, 29, 26, 10, 25, 24, 3, 10, 31, 25, 6, 34, 34, 11, 33, 24, 34, 26, 6, 30, 10, 30, 29, 10, 31, 25, 11, 13, 11, 13, 6, 11, 13, 11, 13, 34, 34, 11, 10, 31, 25, 25, 31, 25, 26, 2, 29, 26, 25, 10, 31, 25, 6, 11, 13, 33, 26, 34, 34, 29, 26, 6, 1, 29, 25, 6, 10, 31, 25, 10, 31, 10, 25, 6, 34, 11, 10, 21, 13, 26, 31, 25, 11, 13, 10, 25, 6, 34, 34, 11, 33, 34, 34, 26, 12, 10, 30, 24, 12, 29, 6, 10, 31, 31, 25, 34, 33, 24, 26, 15, 10, 18, 31, 29, 25, 25, 34, 11, 26, 13, 10, 31, 6, 34, 11, 18, 26, 34, 10, 21, 21, 30, 18, 1, 29, 10, 31, 34, 8, 13, 24, 4, 26, 10, 30, 11, 13, 6, 10, 25, 25, 25, 10, 30, 31, 25, 34, 34, 11, 33, 34, 34, 29, 29, 4, 1, 24, 15, 10, 30, 26, 10, 31, 10, 31, 31, 6, 1, 10, 30, 26, 6, 29, 6, 10, 25, 31, 25, 34, 34, 11, 33, 29, 25, 6, 10, 30, 6, 10, 34, 31, 25, 25, 6, 10, 31, 25, 6, 10, 25, 31, 34, 34]), tensor([ 0, 35, 33, 24, 34, 26, 18, 1, 10, 31, 26, 10, 30, 6, 11, 26, 13, 29, : 26, 6, 10, 25, 25, 34, 10, 21, 30, 25, 25, 10, 25, 25, 25, 11, 13, 25, 25, 25, 11, 13, 25, 25, 10, 30, 13, 6, 6, 34, 25, 26, 33, 29, 25, 10, 30, 25, 10, 25, 10, 30, 25, 27, 2, 10, 30, 10, 18, 25, 34, 11, 11, 13, 29, 10, 25, 25, 10, 30, 25, 4, 1, 10, 31, 25, 34, 11, 4, 26, 25, 1, 24, 20, 26, 25, 29, 10, 30, 11, 13, 10, 31, 25, 11, 13, 11, 13, 10, 10, 25, 25, 25, 6, 11, 26, 33, 25, 11, 14, 13, 1, 34, 11, 26, 25, 33, 36, 24, 20, 26, 25, 29, 11, 20, 13, 10, 31, 26, 10, 25, 34, 11, 26, 6, 10, 25, 25, 25, 25, 25, 34, 33, 34, 33, 24, 26, 25, 26, 11, 11, 10, 31, 26, 10, 21, 30, 1, 10, 31, 10, 30, 10, 30, 10, 25, 25, 10, 6, 11, 13, 26, 25, 34, 11, 18, 26, 33, 24, 30, 3, 26, 10, 31, 25, 11, 13, 1, 10, 31, 34, 11, 26, 14, 13, 10, 30, 25, 25, 34, 28, 26, 2, 10, 30, 25, 34, 28, 26, 2, 10, 21, 30, 10, 25, 34, 10, 4, 34, 11, 10, 13, 25, 10, 25, 25, 10, 25, 6, 34, 33, 26, 25, 10, 25, 36, 11, 26, 10, 25, 25, 13, 34, 11, 26, 1, 10, 30, 34, 28, 14, 5, 33, 10, 18, 25, 34, 11, 26, 11, 1, 13, 23, 4, 30, 36, 10, 30, 25, 11, 1, 10, 30, 11, 13, 10, 31, 25, 25, 6, 11, 1, 18, 29, 34, 34]), tensor([ 0, 35, 26, 29, 6, 10, 30, 25, 6, 25, 26, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 25, 6, 11, 13, 11, 13, 29, 10, 30, 25, 6, 25, 26, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 26, 6, 10, 25, 6, 10, 30, 6, 10, 30, 10, 25, 10, 25, 28, 4, 30, 25, 21, 30, 25, 10, 25, 28, 11, 13, 26, 6, 10, 25, 10, 30, 28, 25, 11, 13, 36, 25, 10, 25, 30, 26, 11, 13, 13, 11, 33, 26, 6, 10, 25, 10, 30, 28, 25, 11, 13, 26, 25, 10, 31, 11, 13, 11, 10, 13, 10, 32, 28, 25, 13, 10, 26, 11, 13, 6, 11, 13, 6, 11, 10, 32, 13, 25, 11, 13, 10, 31, 6, 11, 13, 25, 11, 13, 13, 11, 10, 32, 13, 25, 11, 13, 10, 31, 25, 13, 6, 11, 13, 11, 13, 11, 10, 32, 13, 25, 11, 13, 10, 31, 11, 13, 11, 13, 6, 11, 13, 11, 13, 6, 11, 8, 33, 26, 6, 10, 30, 6, 10, 30, 25, 11, 13, 26, 25, 10, 25, 11, 13, 29, 11, 13, 11, 13, 25, 11, 3, 26, 25, 10, 25, 11, 13, 26, 10, 31, 25, 25, 11, 24, 13, 26, 25, 10, 25, 10, 30, 28, 25, 11, 13, 10, 31, 25, 25, 11, 33, 26, 25, 10, 25, 10, 30, 28, 25, 11, 13, 10, 31, 6, 11, 13, 25, 11, 13, 25, 34]))

yzhangcs commented 4 years ago

Is this bert_lens? Why the first element is 0?

attardi commented 4 years ago

These are the values of data from:

for data, field in zip(raw_batch, self.fields)

yzhangcs commented 4 years ago

It's not feats. You can print feats below https://github.com/yzhangcs/parser/blob/c22c4000b2c75d292e2cf9067a11668afb624977/parser/cmds/cmd.py#L77

attardi commented 4 years ago

These are the feats before the crash:

FEATS: [tensor([[ 101, 479, 451, ..., 13762, 107, 119], [ 101, 10244, 29774, ..., 0, 0, 0], [ 101, 479, 14184, ..., 0, 0, 0], [ 101, 44873, 38453, ..., 0, 0, 0]], device='cuda:0'), tensor([[1, 1, 2, ..., 4, 1, 1], [1, 1, 1, ..., 0, 0, 0], [1, 1, 3, ..., 0, 0, 0], [1, 2, 1, ..., 0, 0, 0]], device='cuda:0'), tensor([[ True, True, True, ..., True, True, True], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]], device='cuda:0')]

attardi commented 4 years ago

I reduced the size of the treebank to a portion of 871 lines, where is still gave the same device error. Then I trimmed that to the first 483 lines which contain a sentence 388 word long, and there rI got a different error, which turns out to be the same I have with transformers 2.10.0:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [55,0,0], thread: [31,0,0] Assertion srcIndex < srcSe lectDimSize failed. Traceback (most recent call last): File "run.py", line 58, in cmd(args) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call self.train(train.loader) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 92, in train arc_scores, rel_scores = self.model(words, feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward feat_embed = self.feat_embed(feats) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 40, in forward , , bert = self.bert(subwords, attention_mask=bert_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 627, in forward head_mask=head_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 348, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i]) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modelingbert.py", line 326, in forw , _, bert = self.bert(subwords, attention_mask=bert_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 627, in forward head_mask=head_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 348, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i]) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 326, in forward attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 283, in forward self_outputs = self.self(input_tensor, attention_mask, head_mask) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/transformers/modeling_bert.py", line 202, in forward mixed_query_layer = self.query(hidden_states) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

attardi commented 4 years ago

Indeed there is a problem with long sentences, which I had tried to fix with my earlier merge request, where I added a parameter --max-sent-length.

yzhangcs commented 4 years ago

Is it because the sentences exceed the length limit of 512?

attardi commented 4 years ago

No, it is only 383 tokens. Where does the 512 limit come from? Shall I add a check and a warning?

attardi commented 4 years ago

Te sentence is 383 tokens, but the workpieces are 673. I traced te error to this call, in lib64/python3.6/site-packages/torch/nn/modules/sparse.py

111 -> def forward(self, input): 112 return F.embedding( 113 input, self.weight, self.padding_idx, self.max_norm, 114 self.norm_type, self.scale_grad_by_freq, self.sparse)

where input = arange(672) (Pdb) --Return-- THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered

So it likely hits the limit of 512 of BERT inputs and breaks the memory. I wonder why they don't check this limit and avoid memory corruption.

attardi commented 4 years ago

I wonder why the parser passes a vector of length 673 to BERT. The sentence, after tokenization in BertField.numericalize(), is just 399 pieces.

yzhangcs commented 4 years ago

Sorry for late reply. Did you have a check of the max sentence length in the batch. Each sentence in a batch will be padded to the max length.

attardi commented 4 years ago

There are only two sentences now in the corpus (one of length 3 and one of length 383). The latter is turned into 399 pieces by the tokenizer. Then batch has these sizes: BATCH [[399], [[673], [399], [673]], [399], [399]] the len(subwords[0]) in BertEmbedding.forward is: 673

yzhangcs commented 4 years ago

Hi, I think you should discard all the overlong sentences before training. Here are some snippets.

>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> sentences = [s for s in sentences if len(tokenizer.encode(' '.join(s))) < 512]

BTW, I updated the code on dev branch just now. The code now is much faster than before for bert, and is compatible with transformers 2.10. You can switch to dev and have a test.

attardi commented 4 years ago

Since there is no way to know beforehand the lengths of the workpieces, the solution requires that the BertField to do the check

In BertField.numericalize():

        sequence = [piece if piece else self.transform(self.pad)
                    for piece in sequence]
        pieces = sum(sequence, [])
        if len(pieces) > self.max_len: # BERT pretrained limit                                                    
            print(f"more than {self.max_len} wordpieces from sentence {i} of length {len(sequence)}",
                  file=sys.stderr)
            drop.append(i)
        subwords.append(pieces)
        lens.append(torch.tensor([len(piece) for piece in sequence]))
subwords = [torch.tensor(pieces) for pieces in subwords]
    mask = [torch.ones(len(pieces)).ge(0) for pieces in subwords]

    return list(zip(subwords, lens, mask)), drop

and similarly change other numericalize to return sequence, []

In TextDataset.init()

   discard = set()
    for field in self.fields:
        value, drop  = field.numericalize(getattr(corpus, field.name))
        setattr(self, field.name, value)
        discard = discard.union(drop)
    discard = list(discard)
    # drop too long sentences. Attardi                                                                            
    for field in self.fields:
        value = getattr(self, field.name)
        value = [x for i,x in enumerate(value) if i not in discard]
        setattr(self, field.name, value)
    # NOTE: the final bucket count is roughly equal to n_buckets                                                  
    # the length should be those of the wordpieces, not of corpus. Attardi                                        
    self.lengths = [len(s) + sum([bool(field.bos), bool(field.bos)])
                    for i,s in enumerate(corpus) if i not in discard]

What do you think?

yzhangcs commented 4 years ago

I don't think the code of BERT should be interwoven with others. My idea is that it might be better to put it in preprocessing.

attardi commented 4 years ago

Dropping the sentence is just a temporary patch. Since you are using a BiLSTM, you can handle longer sentences. Maybe you split the sentence into 512 blocks and get the embeddings for each.

attardi commented 4 years ago

BTW, I don't understand why with transformers 2.10.0 the same sentence is split into 1700 wordpieces. I thought the BERT model data would be the same.

attardi commented 4 years ago

BTW. I am publishing a paper on the experiments we did for the IPWT2020 Shared task on dependency parsing. I reported the measured speed performance of your parser at 85 sents/sec on average. It beats all other parsers by a factor of 5. What did you do to improve on Dozat's parser? Would you be interested in writing a paper for Coling 2020, further discussing these parsers? Send me email to discuss this in private.

yzhangcs commented 4 years ago

Dropping the sentence is just a temporary patch. Since you are using a BiLSTM, you can handle longer sentences. Maybe you split the sentence into 512 blocks and get the embeddings for each.

Yes, I agree with you. There are indeed some tricks for BERT to deal with long sequences: google-research/bert#27 But I think if you do care about the long text parsing, XLNet is a better choice.

attardi commented 4 years ago

Splitting and recombining is what I was thinking.

You can see here the results of the submissions to IWT2020: http://quest.ms.mff.cuni.cz/sharedtask/cgi-bin/overview.pl The University of Turku did best, especially on Baltic languages.

attardi commented 4 years ago

Also I wonder why the speed drops to a half going from transformers 2.1.1 to 2.10.0. Any idea, besides having longer workpieces?

yzhangcs commented 4 years ago

Sorry I also don't know why its different. I have pushed the latest code to dev, and the difference may have disappeared. I will close this issue since the bug is located. We can discuss other details in private via email.

yzhangcs / parser

RuntimeError: copy_if failed to synchronize #27