Closed awni closed 2 weeks ago
@jagrit06 this seems that we are overflowing an integer index into the output as it starts to break in the 2B range. INT_MAX is on the small side for the largest output we can support though.
Anything we can do to support larger sizes?
If not, we should put some throws in the ops as these are sneaky to debug.
This particular case is simple since what happens is when we try to compute auto batch_size_out = out.size() / (M *N);
, the int M and N multiple to overflow and then the batch_size_out comes out to 0
The simple fix here is do that in size_t and I can make a couple other changes to make sure we can handle the large shapes
The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint
I tacked on a quick fix with #1058
The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint
That seems like a much more rare case.
thanks guys really cool work and fast fix!
Decrease
131072
by131071
produces the right output, but above that the outputs don't match as they should.