ml-explore / mlx

MLX: An array framework for Apple silicon
https://ml-explore.github.io/mlx/
MIT License
17.54k stars 1.01k forks source link

[BUG] Wrong result for sliced matmul on GPU #1163

Closed davidkoski closed 6 months ago

davidkoski commented 6 months ago

Describe the bug

matmul behavior change between GPU and CPU as seen in https://github.com/ml-explore/mlx-swift/issues/94

To Reproduce

Include code snippet

import mlx.core as mx

mArray = [
    [0.943755, 0.162902, -0.287733, -0.241071],
    [0.0669876, -0.946367, -0.316074, 0.150359],
    [-0.32379, 0.279022, -0.90405, -0.54078],
    [0, 0, 0, 1]
]

m = mx.array(mArray)

r = m[0:3, 0:3]
t = m[0:3, 3:4]

print(mx.matmul(r, t, stream=mx.cpu))
print(mx.matmul(r, t, stream=mx.gpu))

Expected behavior

Expected that the two prints should produce identical (or nearly so) results. Actual:

array([[-0.0474179],
       [0.0124829],
       [0.608902]], dtype=float32)
array([[0.0557015],
       [0.219578],
       [0.952311]], dtype=float32)

Desktop (please complete the following information):

awni commented 6 months ago

Looks like a bug with the GPU op as numpy gives the same result as the CPU. @jagrit06 do you mind taking a look?