Closed attardi closed 4 years ago
Sorry, I'm not sure why you encountered this error.
May be you can have a try to install transformers
with version 2.1.1.
I I run the same code after installing transformers 2.10.0: Successfully installed transformers-2.10.0
I get a lot of Assertion errors and then a failure in :
...
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [34,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "run.py", line 58, in cublasCreate(handle)
The sizes for input and weight are: [5, 812, 768] and [768, 768].
Sorry, I'm not sure why you encountered this error. May be you can have a try to install
transformers
with version 2.1.1.
This was with transformers 2.1.1. The error with 2.10.0 is above..
Could you factorize this line into two or more steps? https://github.com/yzhangcs/parser/blob/c22c4000b2c75d292e2cf9067a11668afb624977/parser/modules/bert.py#L43 This may give you more ideas about bugs
I changed to: tmp_lens = bert_lens[mask].tolist() tmp_mask = bert[bert_mask] bert = tmp_mask.split(tmp_lens)
and I get:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [39,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "run.py", line 58, in
The sizes of bert_lens and mask are: [4, 399] and [4, 399].
Sorry, I have no clues. It's probably because of incompatible pytorch versions, and mine is 1.3,0 or higher. And if this issue does not occur in other treebanks, I think you should check your data preprocessing.
My torch version is '1.4.0' and this occurs also on another machine and other treebanks. With transformers 2.10.00 is even worse: it seems that there is an Assert that fails, hinting that there might a lurking bug, that the assert catches.
I added a print here:
print('LENS:', bert_lens.size(), mask.size())
print(bert_lens.cpu(), mask.cpu())
tmp_lens = bert_lens[mask]
and a few of them pass, except the last:
LENS: torch.Size([65, 89]) torch.Size([65, 89])
tensor([[1, 1, 2, ..., 0, 0, 0],
[1, 1, 2, ..., 0, 0, 0],
[1, 1, 3, ..., 1, 1, 1],
...,
[1, 1, 3, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 2, ..., 0, 0, 0]]) tensor([[ True, True, True, ..., False, False, False],
[ True, True, True, ..., False, False, False],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., False, False, False],
[ True, True, True, ..., False, False, False],
[ True, True, True, ..., False, False, False]])
LENS: torch.Size([4, 399]) torch.Size([4, 399])
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [43,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
I get the same error if I move the data to the CPU:
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 44, in forward cpu_bert = bert_lens.cpu()[mask.cpu()] # DEBUG RuntimeError: CUDA error: device-side assert triggered
Could print the python list of some samples, I will have a test.
It's so weird. š¢
There is something wrong in the device:
print('LENS:', bert_lens.size(), bert_mask.size())
cpu_bert_lens = bert_lens.cpu()
cpu_mask = mask.cpu()
print(cpu_bert_lens, cpu_mask)
print(cpu_bert_lens[cpu_mask])
LENS: torch.Size([65, 89]) torch.Size([65, 173]) tensor([[1, 1, 2, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0], [1, 1, 3, ..., 1, 1, 1], ..., [1, 1, 3, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0], [1, 1, 2, ..., 0, 0, 0]]) tensor([[ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., True, True, True], ..., [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]]) tensor([1, 1, 2, ..., 4, 1, 1]) LENS: torch.Size([4, 399]) torch.Size([4, 673])
...
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT
ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds
tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [58,0,0], thread: [127,0,0] Assertion srcIndex < srcS electDimSize
failed.
Traceback (most recent call last):
File "run.py", line 58, in
It seems that the error lurks in parser.utils.field.BertField
.
I think you should check the generation of BERT inputs step by step to find out this bug.
Which kind of checks shall I do?
Check if some values in bert_lens
are invalid, i.e., <=0.
And print(feats)
as soon as it is created.
I can't even access the data in bert_lens:
print('LENS: negative', any(float(x) < 0 for c in bert_lens for x in c))
RuntimeError: CUDA error: device-side assert triggered
In which line did you print this?
Just before accessing bert_lens[mask], line 43.
Where are the feats generated?
This is the last batch generated in TextDataLoader.iter() before padsequence, before the crash:
BATCH (tensor([ 0, 35, 33, 34, 24, 26, 26, 2, 10, 31, 26, 10, 25, 28, 28, 11, 13, 28, 28, 25, 4, 12, 29, 25, 6, 25, 10, 30, 25, 34, 11, 26, 13, 25, 10, 25, 6, 10, 25, 6, 11, 10, 13, 6, 10, 25, 6, 10, 25, 6, 30, 11, 10, 13, 6, 34, 11, 33, 26, 10, 25, 10, 31, 24, 30, 3, 29, 18, 25, 10, 30, 26, 25, 28, 11, 13, 10, 28, 10, 25, 34, 24, 24, 20, 3, 24, 15, 10, 31, 10, 30, 25, 11, 13, 25, 10, 25, 6, 11, 13, 10, 25, 25, 34, 11, 26, 13, 1, 29, 10, 21, 30, 25, 25, 34, 11, 26, 25, 6, 10, 25, 33, 10, 30, 18, 25, 24, 24, 3, 36, 29, 10, 31, 6, 24, 30, 3, 26, 36, 36, 10, 25, 34, 5, 7, 25, 8, 33, 29, 10, 25, 6, 10, 30, 25, 6, 10, 30, 1, 10, 31, 6, 26, 2, 24, 15, 10, 30, 25, 25, 6, 34, 11, 26, 25, 10, 30, 25, 11, 13, 25, 10, 25, 11, 10, 13, 25, 29, 11, 8, 10, 13, 25, 10, 25, 25, 34, 10, 17, 24, 24, 27, 25, 6, 1, 30, 10, 30, 10, 30, 10, 28, 24, 30, 3, 36, 10, 30, 30, 11, 13, 10, 28, 10, 30, 6, 34, 11, 26, 33, 29, 36, 10, 31, 25, 10, 25, 11, 25, 21, 13, 26, 2, 29, 34, 11, 33, 24, 26, 29, 25, 10, 25, 25, 11, 13, 5, 26, 10, 25, 18, 25, 27, 2, 10, 31, 34, 11, 24, 30, 29, 25, 4, 8, 33, 29, 6, 10, 25, 8, 1, 24, 3, 29, 34, 11, 26, 33, 10, 21, 21, 30, 29, 6, 34, 11, 13, 29, 10, 30, 25, 10, 30, 10, 25, 25, 25, 11, 13, 25, 10, 24, 30, 29, 25, 11, 13, 10, 30, 25, 25, 25, 11, 13, 25, 11, 13, 10, 25, 25, 10, 25, 11, 13, 25, 10, 25, 10, 25, 25, 34, 11, 5, 13, 10, 25, 25, 11, 30, 10, 25, 6, 11, 26, 14, 3, 25, 5, 11, 26, 13, 25, 34, 11, 8, 33, 36, 10, 31, 25, 6, 10, 25, 6, 10, 25, 11, 13, 25, 34, 11, 10, 13, 26, 10, 25, 25, 25, 10, 25, 11, 8, 10, 25, 11, 10, 13, 34, 34]), tensor([ 0, 10, 30, 25, 25, 35, 26, 25, 25, 10, 30, 25, 10, 30, 34, 10, 31, 25, 25, 31, 6, 10, 30, 6, 6, 34, 11, 13, 26, 25, 6, 25, 25, 26, 2, 29, 26, 25, 22, 30, 4, 10, 30, 25, 10, 34, 31, 34, 24, 26, 25, 25, 25, 24, 25, 26, 10, 25, 10, 25, 12, 34, 24, 29, 26, 10, 25, 24, 3, 10, 31, 25, 6, 34, 34, 11, 33, 24, 34, 26, 6, 30, 10, 30, 29, 10, 31, 25, 11, 13, 11, 13, 6, 11, 13, 11, 13, 34, 34, 11, 10, 31, 25, 25, 31, 25, 26, 2, 29, 26, 25, 10, 31, 25, 6, 11, 13, 33, 26, 34, 34, 29, 26, 6, 1, 29, 25, 6, 10, 31, 25, 10, 31, 10, 25, 6, 34, 11, 10, 21, 13, 26, 31, 25, 11, 13, 10, 25, 6, 34, 34, 11, 33, 34, 34, 26, 12, 10, 30, 24, 12, 29, 6, 10, 31, 31, 25, 34, 33, 24, 26, 15, 10, 18, 31, 29, 25, 25, 34, 11, 26, 13, 10, 31, 6, 34, 11, 18, 26, 34, 10, 21, 21, 30, 18, 1, 29, 10, 31, 34, 8, 13, 24, 4, 26, 10, 30, 11, 13, 6, 10, 25, 25, 25, 10, 30, 31, 25, 34, 34, 11, 33, 34, 34, 29, 29, 4, 1, 24, 15, 10, 30, 26, 10, 31, 10, 31, 31, 6, 1, 10, 30, 26, 6, 29, 6, 10, 25, 31, 25, 34, 34, 11, 33, 29, 25, 6, 10, 30, 6, 10, 34, 31, 25, 25, 6, 10, 31, 25, 6, 10, 25, 31, 34, 34]), tensor([ 0, 35, 33, 24, 34, 26, 18, 1, 10, 31, 26, 10, 30, 6, 11, 26, 13, 29, : 26, 6, 10, 25, 25, 34, 10, 21, 30, 25, 25, 10, 25, 25, 25, 11, 13, 25, 25, 25, 11, 13, 25, 25, 10, 30, 13, 6, 6, 34, 25, 26, 33, 29, 25, 10, 30, 25, 10, 25, 10, 30, 25, 27, 2, 10, 30, 10, 18, 25, 34, 11, 11, 13, 29, 10, 25, 25, 10, 30, 25, 4, 1, 10, 31, 25, 34, 11, 4, 26, 25, 1, 24, 20, 26, 25, 29, 10, 30, 11, 13, 10, 31, 25, 11, 13, 11, 13, 10, 10, 25, 25, 25, 6, 11, 26, 33, 25, 11, 14, 13, 1, 34, 11, 26, 25, 33, 36, 24, 20, 26, 25, 29, 11, 20, 13, 10, 31, 26, 10, 25, 34, 11, 26, 6, 10, 25, 25, 25, 25, 25, 34, 33, 34, 33, 24, 26, 25, 26, 11, 11, 10, 31, 26, 10, 21, 30, 1, 10, 31, 10, 30, 10, 30, 10, 25, 25, 10, 6, 11, 13, 26, 25, 34, 11, 18, 26, 33, 24, 30, 3, 26, 10, 31, 25, 11, 13, 1, 10, 31, 34, 11, 26, 14, 13, 10, 30, 25, 25, 34, 28, 26, 2, 10, 30, 25, 34, 28, 26, 2, 10, 21, 30, 10, 25, 34, 10, 4, 34, 11, 10, 13, 25, 10, 25, 25, 10, 25, 6, 34, 33, 26, 25, 10, 25, 36, 11, 26, 10, 25, 25, 13, 34, 11, 26, 1, 10, 30, 34, 28, 14, 5, 33, 10, 18, 25, 34, 11, 26, 11, 1, 13, 23, 4, 30, 36, 10, 30, 25, 11, 1, 10, 30, 11, 13, 10, 31, 25, 25, 6, 11, 1, 18, 29, 34, 34]), tensor([ 0, 35, 26, 29, 6, 10, 30, 25, 6, 25, 26, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 25, 6, 11, 13, 11, 13, 29, 10, 30, 25, 6, 25, 26, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 11, 13, 26, 6, 10, 25, 6, 10, 30, 6, 10, 30, 10, 25, 10, 25, 28, 4, 30, 25, 21, 30, 25, 10, 25, 28, 11, 13, 26, 6, 10, 25, 10, 30, 28, 25, 11, 13, 36, 25, 10, 25, 30, 26, 11, 13, 13, 11, 33, 26, 6, 10, 25, 10, 30, 28, 25, 11, 13, 26, 25, 10, 31, 11, 13, 11, 10, 13, 10, 32, 28, 25, 13, 10, 26, 11, 13, 6, 11, 13, 6, 11, 10, 32, 13, 25, 11, 13, 10, 31, 6, 11, 13, 25, 11, 13, 13, 11, 10, 32, 13, 25, 11, 13, 10, 31, 25, 13, 6, 11, 13, 11, 13, 11, 10, 32, 13, 25, 11, 13, 10, 31, 11, 13, 11, 13, 6, 11, 13, 11, 13, 6, 11, 8, 33, 26, 6, 10, 30, 6, 10, 30, 25, 11, 13, 26, 25, 10, 25, 11, 13, 29, 11, 13, 11, 13, 25, 11, 3, 26, 25, 10, 25, 11, 13, 26, 10, 31, 25, 25, 11, 24, 13, 26, 25, 10, 25, 10, 30, 28, 25, 11, 13, 10, 31, 25, 25, 11, 33, 26, 25, 10, 25, 10, 30, 28, 25, 11, 13, 10, 31, 6, 11, 13, 25, 11, 13, 25, 34]))
Is this bert_lens
? Why the first element is 0?
These are the values of data from:
for data, field in zip(raw_batch, self.fields)
It's not feats
. You can print feats
below
https://github.com/yzhangcs/parser/blob/c22c4000b2c75d292e2cf9067a11668afb624977/parser/cmds/cmd.py#L77
These are the feats before the crash:
FEATS: [tensor([[ 101, 479, 451, ..., 13762, 107, 119], [ 101, 10244, 29774, ..., 0, 0, 0], [ 101, 479, 14184, ..., 0, 0, 0], [ 101, 44873, 38453, ..., 0, 0, 0]], device='cuda:0'), tensor([[1, 1, 2, ..., 4, 1, 1], [1, 1, 1, ..., 0, 0, 0], [1, 1, 3, ..., 0, 0, 0], [1, 2, 1, ..., 0, 0, 0]], device='cuda:0'), tensor([[ True, True, True, ..., True, True, True], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]], device='cuda:0')]
I reduced the size of the treebank to a portion of 871 lines, where is still gave the same device error. Then I trimmed that to the first 483 lines which contain a sentence 388 word long, and there rI got a different error, which turns out to be the same I have with transformers 2.10.0:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT
ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds
tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [55,0,0], thread: [31,0,0] Assertion srcIndex < srcSe lectDimSize
failed.
Traceback (most recent call last):
File "run.py", line 58, in cublasCreate(handle)
Indeed there is a problem with long sentences, which I had tried to fix with my earlier merge request, where I added a parameter --max-sent-length.
Is it because the sentences exceed the length limit of 512?
No, it is only 383 tokens. Where does the 512 limit come from? Shall I add a check and a warning?
Te sentence is 383 tokens, but the workpieces are 673. I traced te error to this call, in lib64/python3.6/site-packages/torch/nn/modules/sparse.py
111 -> def forward(self, input): 112 return F.embedding( 113 input, self.weight, self.padding_idx, self.max_norm, 114 self.norm_type, self.scale_grad_by_freq, self.sparse)
where input = arange(672) (Pdb) --Return-- THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
So it likely hits the limit of 512 of BERT inputs and breaks the memory. I wonder why they don't check this limit and avoid memory corruption.
I wonder why the parser passes a vector of length 673 to BERT. The sentence, after tokenization in BertField.numericalize(), is just 399 pieces.
Sorry for late reply. Did you have a check of the max sentence length in the batch. Each sentence in a batch will be padded to the max length.
There are only two sentences now in the corpus (one of length 3 and one of length 383). The latter is turned into 399 pieces by the tokenizer. Then batch has these sizes: BATCH [[399], [[673], [399], [673]], [399], [399]] the len(subwords[0]) in BertEmbedding.forward is: 673
Hi, I think you should discard all the overlong sentences before training. Here are some snippets.
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> sentences = [s for s in sentences if len(tokenizer.encode(' '.join(s))) < 512]
BTW, I updated the code on dev branch just now. The code now is much faster than before for bert, and is compatible with transformers 2.10. You can switch to dev and have a test.
Since there is no way to know beforehand the lengths of the workpieces, the solution requires that the BertField to do the check
In BertField.numericalize():
sequence = [piece if piece else self.transform(self.pad)
for piece in sequence]
pieces = sum(sequence, [])
if len(pieces) > self.max_len: # BERT pretrained limit
print(f"more than {self.max_len} wordpieces from sentence {i} of length {len(sequence)}",
file=sys.stderr)
drop.append(i)
subwords.append(pieces)
lens.append(torch.tensor([len(piece) for piece in sequence]))
subwords = [torch.tensor(pieces) for pieces in subwords]
mask = [torch.ones(len(pieces)).ge(0) for pieces in subwords]
return list(zip(subwords, lens, mask)), drop
and similarly change other numericalize to return sequence, []
In TextDataset.init()
discard = set()
for field in self.fields:
value, drop = field.numericalize(getattr(corpus, field.name))
setattr(self, field.name, value)
discard = discard.union(drop)
discard = list(discard)
# drop too long sentences. Attardi
for field in self.fields:
value = getattr(self, field.name)
value = [x for i,x in enumerate(value) if i not in discard]
setattr(self, field.name, value)
# NOTE: the final bucket count is roughly equal to n_buckets
# the length should be those of the wordpieces, not of corpus. Attardi
self.lengths = [len(s) + sum([bool(field.bos), bool(field.bos)])
for i,s in enumerate(corpus) if i not in discard]
What do you think?
I don't think the code of BERT should be interwoven with others. My idea is that it might be better to put it in preprocessing.
Dropping the sentence is just a temporary patch. Since you are using a BiLSTM, you can handle longer sentences. Maybe you split the sentence into 512 blocks and get the embeddings for each.
BTW, I don't understand why with transformers 2.10.0 the same sentence is split into 1700 wordpieces. I thought the BERT model data would be the same.
BTW. I am publishing a paper on the experiments we did for the IPWT2020 Shared task on dependency parsing. I reported the measured speed performance of your parser at 85 sents/sec on average. It beats all other parsers by a factor of 5. What did you do to improve on Dozat's parser? Would you be interested in writing a paper for Coling 2020, further discussing these parsers? Send me email to discuss this in private.
Dropping the sentence is just a temporary patch. Since you are using a BiLSTM, you can handle longer sentences. Maybe you split the sentence into 512 blocks and get the embeddings for each.
Yes, I agree with you. There are indeed some tricks for BERT
to deal with long sequences: google-research/bert#27
But I think if you do care about the long text parsing, XLNet
is a better choice.
Splitting and recombining is what I was thinking.
You can see here the results of the submissions to IWT2020: http://quest.ms.mff.cuni.cz/sharedtask/cgi-bin/overview.pl The University of Turku did best, especially on Baltic languages.
Also I wonder why the speed drops to a half going from transformers 2.1.1 to 2.10.0. Any idea, besides having longer workpieces?
Sorry I also don't know why its different. I have pushed the latest code to dev, and the difference may have disappeared. I will close this issue since the bug is located. We can discuss other details in private via email.
When training with the a relatively small Arabian corpus from here (I just made a few simple changes to the reader to skip comments and multi-tokens in the CoNLLU format): http://ufal.mff.cuni.cz/~zeman/soubory/iwpt2020-train-dev.tgz I get this error:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [91,0,0] Assertion
cmd(args)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/train.py", line 85, in call
self.train(train.loader)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/cmds/cmd.py", line 91, in train
arc_scores, rel_scores = self.model(words, feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/model.py", line 95, in forward
feat_embed = self.feat_embed(feats)
File "/homenfs/tempGPU/iwpt2020/.env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/homenfs/tempGPU/iwpt2020/biaffine-parser/parser/modules/bert.py", line 43, in forward
bert = bert[bert_mask].split(bert_lens[mask].tolist())
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered
srcIndex < srcSelectDimSize
failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [92,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [93,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [94,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [38,0,0], thread: [95,0,0] AssertionsrcIndex < srcSelectDimSize
failed. Traceback (most recent call last): File "run.py", line 58, inIt works on CPU however.
Thank you for a nice project.