Closed sachinraja13 closed 1 month ago
There is no bug here it is just the expected numerical difference after 5 matrix multiplications. The matrices are initialized with very big values which makes the final result in the millions which makes matters worse.
For instance changing the initialization to the common fan in
ie initialize from N(0, 1/sqrt(input_dims))
instead of of N(0, 1)
then the results pass the check and the output is in a much more reasonable range ~[-0.5, 0.5]
.
I did the above by changing the line in initialize_numpy_arrays_and_model
as follows
np_array = np.random.randn(*param_shape).astype(np.float32) / param_shape[-1]**0.5
I will close the issue but feel free to reopen it if you think I didn't cover it.
That helps @angeloskath . Many thanks!
Describe the bug Different outputs from PyTorch and MLX of a simple MLP despite same weight initialisation and same input
To Reproduce
Expected behavior All output values from Pytorch and MLX models within the range of 1e-4.
Actual Output
Additional context MLX 0.17.3 Pytorch 2.1.2