lucadiliello / pytorch-apple-silicon-benchmarks

Performance of PyTorch on Apple Silicon
GNU General Public License v3.0
43 stars 7 forks source link

pytorch-apple-silicon-benchmarks

Benchmarks of PyTorch on Apple Silicon.

This is a work in progress, if there is a dataset or model you would like to add just open an issue or a PR.

Prepare environment

Create conda env with python compiled for osx-arm64 and activate it with:

CONDA_SUBDIR=osx-arm64 conda create -n native python -c conda-forge
conda activate native

and install pytorch nightly build with:

pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

and finally install datasets and transformers with:

pip install transformers datasets

Devices

Results

BERT Transformers in Sequence Classification.

Run the experiments by yourself with:

python tests/transformers_sequence_classification.py \
    --device <cpu|cuda|mps> \
    --pre_trained_name <bert-base-cased|bert-large-cased> \
    --batch_size <32|64|128> \
    --mode <training|inference> \
    --steps 100 \
    --sequence_length <128|512>

The following tables show the time needed to complete 100 steps without gradient accumulation. - means that the script went out of memory. All experiments have been run with float32.

bert-base-cased

Training:

Batch size Sequence length M1 Max CPU (32GB) M1 Max GPU 32-core (32GB) M1 Ultra 48-core (64GB) M2 Ultra GPU 60-core (64GB) M3 Pro GPU 14-core (18GB) M3 Max GPU 40-core (64GB) V100 (16GB) T4 (16GB)
16 128 2m 29s 1m 3s TBD TBD TBD TBD 12s 31s
64 128 8m 32s 2m 57s TBD 49s 2m36s 1m13s 41s 2m
256 128 50m 10s 1h 49m 9s TBD TBD TBD TBD - -
16 512 11m 22s 9m 28s TBD TBD TBD 1m24s 47s 2m 25s
64 512 1h 21m 2s 3h 26m 4s TBD TBD TBD TBD - -
256 512 6h 33m 7s - TBD TBD TBD TBD - -

Inference:

Batch size Sequence length M1 Max CPU (32GB) M1 Max GPU 32-core (32GB) M1 Ultra 48-core (64GB) M2 Ultra GPU 60-core (64GB) M3 Pro GPU 14-core (18GB) M3 Max GPU 40-core (64GB) V100 (16GB) T4 (16GB)
16 128 52s 16s 9s TBD TBD TBD 4s 10s
64 128 3m 2s 50s 20s 47s 51s 21s 13s 44s
256 128 11m 25s 3m 22s 76s TBD TBD TBD 54s 2m 52s
16 512 4m 22s 1m 1s 24s TBD TBD 1m2s 16s 54s
64 512 17m 51s 3m 59s 1m 27s TBD TBD TBD 1m 4s 3m 24s
256 512 1h 10m 41s 15m 47s 5m 42s TBD TBD TBD 4m 10s 14m 18s

Considerations

FAQ