Closed haozheji closed 2 years ago
When I further increase the batch size another error occurs and the GPU's memory is not run out either.
RuntimeError: CUDA error: invalid configuration argument
I found that this error will raise when the batch size dimension is too large.
a = torch.randn(100000, 8, 8).cuda()
b = torch.randn(100000, 8, 8).cuda()
c = genbmm.logbmm(a, b)
print(c)
The following error raises:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 153, in __repr__
return torch._tensor_str._str(self)
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 371, in _str
return _str_intern(self)
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 351, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 273, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 273, in <listcomp>
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 273, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 273, in <listcomp>
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/home/jihaozhe/anaconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 266, in get_summarized_data
return torch.cat((self[:PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems:]))
RuntimeError: CUDA error: invalid configuration argument
The real batch size is not large actually (usually 32 or 16), but it has other length
dimension. Since logbmm
only allows input with three dimensions, I have to view()
the input into three-dimension tensor with a very large batch size dimension.
Further digging, this triggers the error:
>>> a = torch.randn(65536,8,8).cuda()
>>> b = torch.randn(65536,8,8).cuda()
>>> c = genbmm.logbmm(a, b)
>>> print(c)
...
RuntimeError: CUDA error: invalid configuration argument
65535 is just fine:
>>> a = torch.randn(65535,8,8).cuda()
>>> b = torch.randn(65535,8,8).cuda()
>>> c = genbmm.logbmm(a, b)
>>> print(c)
tensor([[[2.3122, 2.8272, 2.3992, ..., 1.7824, 2.2578, 2.4881],
...
[3.2658, 3.6260, 2.9816, ..., 2.3903, 1.8778, 2.1133]]],
device='cuda:0')
Seems like something exceeds the 16-bit limit?
Got this error after a fix number of iterations even with different random seeds. However, the number of iteration depends on the batch size. The GPU is not out of memory, so I suspect that the bug comes from the
matmul_cuda_kernel.cu
?Here is my environment version: