rasbt / machine-learning-notes

Collection of useful machine learning codes and snippets (originally intended for my personal use)
BSD 3-Clause "New" or "Revised" License
782 stars 139 forks source link

vgg16-cifar10 M1 Ultra setup results #12

Closed prachiagrl closed 2 years ago

prachiagrl commented 2 years ago

Hi,

Thank you for making these test scripts available. I ran the latest nightly version of PyTorch with the memory fix called out in the blog post update and got the following results on an M1 Ultra with 128GB RAM and 48core GPU machine. There's nice scaling from previously reported results on M1 Max with 32core GPU configs ๐Ÿ˜€

torch 1.13.0.dev20220522
device mps
Files already downloaded and verified
Epoch: 001/001 | Batch 0000/1406 | Loss: 2.5263
Epoch: 001/001 | Batch 0100/1406 | Loss: 2.3119
Epoch: 001/001 | Batch 0200/1406 | Loss: 2.3646
Epoch: 001/001 | Batch 0300/1406 | Loss: 1.8008
Epoch: 001/001 | Batch 0400/1406 | Loss: 1.9203
Epoch: 001/001 | Batch 0500/1406 | Loss: 1.6827
Epoch: 001/001 | Batch 0600/1406 | Loss: 2.3332
Epoch: 001/001 | Batch 0700/1406 | Loss: 1.9132
Epoch: 001/001 | Batch 0800/1406 | Loss: 1.9166
Epoch: 001/001 | Batch 0900/1406 | Loss: 1.8002
Epoch: 001/001 | Batch 1000/1406 | Loss: 1.7190
Epoch: 001/001 | Batch 1100/1406 | Loss: 1.7783
Epoch: 001/001 | Batch 1200/1406 | Loss: 1.7270
Epoch: 001/001 | Batch 1300/1406 | Loss: 1.7026
Epoch: 001/001 | Batch 1400/1406 | Loss: 1.6350
Time / epoch without evaluation: 13.72 min
Epoch: 001/001 | Train: 38.14% | Validation: 38.58% | Best Validation (Ep. 001): 38.58%
Time elapsed: 16.15 min
Total Training Time: 16.15 min
Test accuracy 38.68%
Total Time: 16.81 min

Using the powermetrics tool the power consumption for {GPU, Package} hovers around {42W, 63W} for most of the time but peaks around {63W, 94W} towards the end. The memory consumption for the various python processes is the 18-20GB range.

rasbt commented 2 years ago

Awesome, added it! Thanks Untitled !

prachiagrl commented 2 years ago

For completion, here's how the CPU performance looks on the M1 Ultra.

torch 1.13.0.dev20220522
device cpu
Files already downloaded and verified
Epoch: 001/001 | Batch 0000/1406 | Loss: 2.3974
Epoch: 001/001 | Batch 0100/1406 | Loss: 2.1493
Epoch: 001/001 | Batch 0200/1406 | Loss: 1.8642
Epoch: 001/001 | Batch 0300/1406 | Loss: 2.0378
Epoch: 001/001 | Batch 0400/1406 | Loss: 2.0125
Epoch: 001/001 | Batch 0500/1406 | Loss: 2.2008
Epoch: 001/001 | Batch 0600/1406 | Loss: 1.9478
Epoch: 001/001 | Batch 0700/1406 | Loss: 1.8912
Epoch: 001/001 | Batch 0800/1406 | Loss: 1.9378
Epoch: 001/001 | Batch 0900/1406 | Loss: 2.0383
Epoch: 001/001 | Batch 1000/1406 | Loss: 1.8418
Epoch: 001/001 | Batch 1100/1406 | Loss: 2.1288
Epoch: 001/001 | Batch 1200/1406 | Loss: 1.8530
Epoch: 001/001 | Batch 1300/1406 | Loss: 1.9260
Epoch: 001/001 | Batch 1400/1406 | Loss: 1.9010
Time / epoch without evaluation: 122.15 min
Epoch: 001/001 | Train: 34.15% | Validation: 34.38% | Best Validation (Ep. 001): 34.38%
Time elapsed: 168.34 min
Total Training Time: 168.34 min
Test accuracy 33.83%
Total Time: 177.68 min
rasbt commented 2 years ago

Thanks, that's the 20-Core CPU?

rasbt commented 2 years ago

The results are a bid weird to be honest, it seems slower than the M1 Pro ๐Ÿ˜…

qpwo commented 2 years ago

The memory consumption for the various python processes is the 18-20GB range.

in total or each? If it's in total, could you speed it up 8x with a higher batch size?

rasbt commented 2 years ago

Yes, changing the batch size helps. I used a constant batch size for fairness across computers though.

qpwo commented 2 years ago

Would it help by like 20% or by like a factor of 8 you think?

rasbt commented 2 years ago

you'd really have to try tbh