torch 1.13.0/1.13.1 makes MPS training benchmark results incomparable

Hi, I ran the vgg16-cifar10.py benchmark on torch version 1.13.1 on my M1 Max MBP with 24 GPU cores and am getting better results than the M1 Max MBP (32 GPU cores) in your blog post (17.88 vs. 31.54 minutes). I also ran it on 1.13.0 and got similar results, so perhaps the stable release of 1.13.0 incorporated some optimizations/fixed some issues. I unfortunately couldn't find the nightly build you used in the blog post.

Just wanted to flag in case folks wanted to compare results for the new Apple Silicon chips (M2 Pro/Max) -- the results may not be comparable.

On 1.13.1:

torch 1.13.1
device mps
Files already downloaded and verified
Epoch: 001/001 | Batch 0000/1406 | Loss: 2.5451
Epoch: 001/001 | Batch 0100/1406 | Loss: 2.1898
Epoch: 001/001 | Batch 0200/1406 | Loss: 2.0818
Epoch: 001/001 | Batch 0300/1406 | Loss: 2.0286
Epoch: 001/001 | Batch 0400/1406 | Loss: 2.0692
Epoch: 001/001 | Batch 0500/1406 | Loss: 2.1230
Epoch: 001/001 | Batch 0600/1406 | Loss: 1.8927
Epoch: 001/001 | Batch 0700/1406 | Loss: 2.2680
Epoch: 001/001 | Batch 0800/1406 | Loss: 2.0252
Epoch: 001/001 | Batch 0900/1406 | Loss: 1.9494
Epoch: 001/001 | Batch 1000/1406 | Loss: 1.5779
Epoch: 001/001 | Batch 1100/1406 | Loss: 1.8452
Epoch: 001/001 | Batch 1200/1406 | Loss: 1.7881
Epoch: 001/001 | Batch 1300/1406 | Loss: 1.8946
Epoch: 001/001 | Batch 1400/1406 | Loss: 1.7741
Time / epoch without evaluation: 17.88 min

rasbt / machine-learning-notes

torch 1.13.0/1.13.1 makes MPS training benchmark results incomparable #32