ml-explore / mlx

MLX: An array framework for Apple silicon
https://ml-explore.github.io/mlx/
MIT License
14.83k stars 845 forks source link

[BUG] Matmul gives wrong output for large sizes #1051

Closed awni closed 2 weeks ago

awni commented 2 weeks ago

Decrease 131072 by 131071 produces the right output, but above that the outputs don't match as they should.


import mlx.core as mx

w = mx.random.uniform(shape=(32, 32 * 4))

x = mx.random.uniform(shape=(131072, 128, 32))

y1 = x[:10] @ w
y2 = x @ w

print((y1 - y2[:10]).max().abs())
awni commented 2 weeks ago

@jagrit06 this seems that we are overflowing an integer index into the output as it starts to break in the 2B range. INT_MAX is on the small side for the largest output we can support though.

Anything we can do to support larger sizes?

If not, we should put some throws in the ops as these are sneaky to debug.

jagrit06 commented 2 weeks ago

This particular case is simple since what happens is when we try to compute auto batch_size_out = out.size() / (M *N);, the int M and N multiple to overflow and then the batch_size_out comes out to 0 The simple fix here is do that in size_t and I can make a couple other changes to make sure we can handle the large shapes

The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint

jagrit06 commented 2 weeks ago

I tacked on a quick fix with #1058

awni commented 2 weeks ago

The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint

That seems like a much more rare case.

thegodone commented 1 week ago

thanks guys really cool work and fast fix!