ml-explore / mlx

MLX: An array framework for Apple silicon
https://ml-explore.github.io/mlx/
MIT License
17.49k stars 1.01k forks source link

Change winograd dispatch condition #1534

Closed awni closed 4 weeks ago

awni commented 1 month ago

The condition is much better for small batch sizes across the board with minor speedup for larger batch sizes.

Resnet 18 benchmark on M2 Ultra:

Batch Size Pre Milliseconds-per-image Post Milliseconds-per-image
1 6.979 1.833
2 3.214 0.947
4 1.468 0.765
8 0.897 0.590
16 0.612 0.526
32 0.500 0.475
64 0.463 0.437

Same benchmark on M1 Max:

Batch Size Pre Milliseconds-per-image Post Milliseconds-per-image
1 19.664 2.644
2 15.996 1.783
4 6.358 1.578
8 4.592 1.409
16 2.927 1.314
32 2.101 1.247
64 1.699 1.649

Same benchmark on M3 Max:

Batch Size Pre Milliseconds-per-image Post Milliseconds-per-image
1 4.046 1.438
2 2.425 1.003
4 1.441 0.839
8 1.018 0.751
16 0.807 0.716
32 0.734 0.699
64 0.704 0.674