I wanted to test having arbitrary mask, but cannot get it to work. I was wondering if this is the right / wrong way to do it? I tried it with the code below, but with mask_mod2 I get
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1841,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
.
.
.
... lib/python3.11/site-packages/torch/_ops.py:1116, in OpOverloadPacket.call(self, *args, *kwargs)
1114 if self._has_torchbind_op_overload and _must_dispatch_in_python(args, kwargs):
1115 return _call_overload_packet_from_python(self, args, kwargs)
-> 1116 return self._op(args, **(kwargs or {}))
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Hi, this is great work!
I wanted to test having arbitrary mask, but cannot get it to work. I was wondering if this is the right / wrong way to do it? I tried it with the code below, but with mask_mod2 I get
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [1841,0,0], thread: [32,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.. . . ... lib/python3.11/site-packages/torch/_ops.py:1116, in OpOverloadPacket.call(self, *args, *kwargs) 1114 if self._has_torchbind_op_overload and _must_dispatch_in_python(args, kwargs): 1115 return _call_overload_packet_from_python(self, args, kwargs) -> 1116 return self._op(args, **(kwargs or {}))
RuntimeError: CUDA error: device-side assert triggered Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.===> example code: