[BUG] matmul yields different results when using concat

Describe the bug matmul yields different result when multiplying vectors concatenated into the same matrix versus multiplying them separately

To Reproduce code:

import mlx.core as mx
import numpy as np

W, H = 2, 5

def test0():
    w = mx.random.uniform(-1, 1, (W, H), dtype=mx.float16)
    x0 = mx.random.uniform(-1, 1, (W,), dtype=mx.float16)
    x1 = mx.random.uniform(-1, 1, (W,), dtype=mx.float16)
    print(mx.array([x0 @ w, x1 @ w]))
    print(mx.array([x0, x1]) @ w)

def test1():
    np_w = np.random.uniform(-1, 1, (W, H)).astype(np.float16)
    x0 = np.random.uniform(-1, 1, (W,)).astype(np.float16)
    x1 = np.random.uniform(-1, 1, (W,)).astype(np.float16)
    print(np.array([x0 @ np_w, x1 @ np_w]))
    print(np.array([x0, x1]) @ np_w)

if __name__ == "__main__":
    print("mlx:")
    mx.random.seed(0)
    test0()

    print("numpy:")
    np.random.seed(0)
    test1()

output:

mlx:
array([[-0.256348, 0.953125, 0.179932, 0.740234, -0.149292],
       [0.234131, -0.737305, -0.0961914, -0.549805, 0.0778198]], dtype=float16)
array([[-0.256348, 0.953125, 0.179932, 0.740234, -0.149292],
       [0.234131, -0.737793, -0.0962524, -0.550293, 0.0778809]], dtype=float16)
numpy:
[[ 0.07385  0.2439   0.1653   0.10596 -0.1026 ]
 [ 0.2615  -0.04764  0.695    0.8013  -0.2192 ]]
[[ 0.07385  0.2439   0.1653   0.10596 -0.1026 ]
 [ 0.2615  -0.04764  0.695    0.8013  -0.2192 ]]

Expected behavior the last four numbers of mlx output should match in the two versions

Desktop (please complete the following information):

OS Version: MacOS 14.4.1
MLX Version: 0.12.2
NumPy Version: 1.26.4

Additional context If this is the case, there should be plenty of issues in your mlx_lm library

ml-explore / mlx

[BUG] matmul yields different results when using concat #1082