Closed anijain2305 closed 1 year ago
The tables show the worst 50 models for different metrics
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+-------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+-------------+-------------+-------------+
| eager | 100%, 55/55 | 93%, 41/44 | 100%, 61/61 |
| aot_eager | 98%, 54/55 | 93%, 41/44 | 90%, 55/61 |
| aot_cudagraphs | 29%, 16/55 | 0%, 0/44 | 0%, 0/61 |
| aot_nvfuser | 62%, 34/55 | 2%, 1/44 | 82%, 50/61 |
| inductor | 87%, 48/55 | 77%, 34/44 | 74%, 45/61 |
+----------------+-------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.02x | 0.0x | 0.0x |
| aot_nvfuser | 1.12x | 1.12x | 1.12x |
| inductor | 1.38x | 1.60x | 1.23x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 5.68 | 13.69 | 11.39 |
| aot_eager | 10.31 | 20.58 | 17.02 |
| aot_cudagraphs | 4.47 | 0.0 | 0.0 |
| aot_nvfuser | 21.51 | 10.59 | 57.77 |
| inductor | 278.25 | 120.52 | 427.42 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 1.00x |
| aot_eager | 0.87x | 0.88x | 0.88x |
| aot_cudagraphs | 0.48x | 0.0x | 0.0x |
| aot_nvfuser | 0.84x | 1.08x | 0.85x |
| inductor | 0.79x | 0.74x | 0.90x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 98%, 42/43 | 90%, 55/61 |
| aot_cudagraphs | 28%, 15/53 | 2%, 1/43 | 8%, 5/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/43 | 75%, 46/61 |
| inductor | 83%, 44/53 | 86%, 37/43 | 90%, 55/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.00x | 1.00x |
| aot_nvfuser | 1.16x | 0.0x | 1.20x |
| inductor | 1.70x | 2.17x | 1.30x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 6.19 | 14.88 | 11.64 |
| aot_eager | 12.45 | 25.75 | 19.94 |
| aot_cudagraphs | 13.09 | 92.75 | 51.56 |
| aot_nvfuser | 29.54 | 0.0 | 80.08 |
| inductor | 271.08 | 116.86 | 450.74 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 1.00x |
| aot_eager | 0.85x | 0.86x | 0.88x |
| aot_cudagraphs | 0.43x | 0.38x | 0.20x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.78x | 0.82x | 0.89x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+-------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+-------------+-------------+-------------+
| eager | 100%, 55/55 | 93%, 41/44 | 100%, 61/61 |
| aot_eager | 98%, 54/55 | 93%, 41/44 | 90%, 55/61 |
| aot_cudagraphs | 29%, 16/55 | 0%, 0/44 | 0%, 0/61 |
| aot_nvfuser | 62%, 34/55 | 2%, 1/44 | 82%, 50/61 |
| inductor | 87%, 48/55 | 77%, 34/44 | 74%, 45/61 |
+----------------+-------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.02x | 0.0x | 0.0x |
| aot_nvfuser | 1.12x | 1.13x | 1.12x |
| inductor | 1.37x | 1.61x | 1.24x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 5.70 | 13.73 | 11.39 |
| aot_eager | 10.34 | 20.46 | 17.09 |
| aot_cudagraphs | 4.54 | 0.0 | 0.0 |
| aot_nvfuser | 21.31 | 10.74 | 57.51 |
| inductor | 265.33 | 111.78 | 417.22 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 1.00x |
| aot_eager | 0.87x | 0.88x | 0.88x |
| aot_cudagraphs | 0.48x | 0.0x | 0.0x |
| aot_nvfuser | 0.84x | 1.08x | 0.85x |
| inductor | 0.79x | 0.74x | 0.89x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 93%, 41/44 |
| inductor | 64%, 28/44 |
+-----------+-------------+
Geometric mean speedup
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 1.00x |
| inductor | 1.76x |
+-----------+-------------+
Mean compilation time (seconds)
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 20.82 |
| inductor | 80.93 |
+-----------+-------------+
Peak memory footprint compression ratio (higher is better)
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 0.88x |
| inductor | 0.74x |
+-----------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 98%, 42/43 | 90%, 55/61 |
| aot_cudagraphs | 28%, 15/53 | 2%, 1/43 | 10%, 6/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/43 | 75%, 46/61 |
| inductor | 81%, 43/53 | 86%, 37/43 | 90%, 55/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.00x | 1.00x |
| aot_nvfuser | 1.16x | 0.0x | 1.20x |
| inductor | 1.68x | 2.20x | 1.31x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 6.15 | 14.88 | 11.73 |
| aot_eager | 12.44 | 25.70 | 19.93 |
| aot_cudagraphs | 12.80 | 93.53 | 51.65 |
| aot_nvfuser | 29.54 | 0.0 | 79.13 |
| inductor | 258.47 | 118.80 | 452.93 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 1.00x |
| aot_eager | 0.85x | 0.86x | 0.88x |
| aot_cudagraphs | 0.43x | 0.38x | 0.19x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.77x | 0.82x | 0.89x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 98%, 42/43 |
| inductor | 84%, 36/43 |
+-----------+-------------+
Geometric mean speedup
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 1.00x |
| inductor | 2.25x |
+-----------+-------------+
Mean compilation time (seconds)
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 25.89 |
| inductor | 87.60 |
+-----------+-------------+
Peak memory footprint compression ratio (higher is better)
+-----------+-------------+
| Compiler | huggingface |
+-----------+-------------+
| aot_eager | 0.86x |
| inductor | 0.83x |
+-----------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 91%, 50/55 | 98%, 43/44 | 100%, 61/61 |
| aot_eager | 89%, 49/55 | 98%, 43/44 | 90%, 55/61 |
| aot_cudagraphs | 25%, 14/55 | 0%, 0/44 | 2%, 1/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/44 | 82%, 50/61 |
| inductor | 84%, 46/55 | 93%, 41/44 | 95%, 58/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.02x | 0.0x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.12x |
| inductor | 1.39x | 1.60x | 1.21x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 5.42 | 14.22 | 11.34 |
| aot_eager | 9.77 | 21.16 | 16.79 |
| aot_cudagraphs | 4.86 | 0.0 | 7.42 |
| aot_nvfuser | 22.48 | 10.56 | 57.73 |
| inductor | 238.15 | 109.27 | 366.65 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.95x | 0.98x | 1.00x |
| aot_eager | 0.86x | 0.89x | 0.88x |
| aot_cudagraphs | 0.41x | 0.0x | 0.25x |
| aot_nvfuser | 0.83x | 1.08x | 0.85x |
| inductor | 0.78x | 0.74x | 0.90x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 92%, 49/53 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 94%, 50/53 | 98%, 42/43 | 90%, 55/61 |
| aot_cudagraphs | 26%, 14/53 | 0%, 0/43 | 11%, 7/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/43 | 75%, 46/61 |
| inductor | 81%, 43/53 | 93%, 40/43 | 93%, 57/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 0.0x | 1.00x |
| aot_nvfuser | 1.16x | 0.0x | 1.19x |
| inductor | 1.71x | 2.29x | 1.31x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 5.68 | 14.88 | 11.61 |
| aot_eager | 11.54 | 25.28 | 19.45 |
| aot_cudagraphs | 7.10 | 0.0 | 52.59 |
| aot_nvfuser | 29.15 | 0.0 | 78.59 |
| inductor | 215.79 | 112.71 | 397.63 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 1.00x |
| aot_eager | 0.86x | 0.87x | 0.88x |
| aot_cudagraphs | 0.44x | 0.0x | 0.20x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.77x | 0.82x | 0.89x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 89%, 49/55 | 98%, 43/44 | 100%, 61/61 |
| aot_eager | 87%, 48/55 | 98%, 43/44 | 90%, 55/61 |
| aot_cudagraphs | 25%, 14/55 | 0%, 0/44 | 2%, 1/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/44 | 82%, 50/61 |
| inductor | 84%, 46/55 | 93%, 41/44 | 97%, 59/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.03x | 0.0x | 1.00x |
| aot_nvfuser | 1.13x | 1.11x | 1.12x |
| inductor | 1.49x | 1.64x | 1.35x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.76 | 2.14 | 2.02 |
| aot_eager | 6.40 | 9.10 | 8.87 |
| aot_cudagraphs | 4.48 | 0.0 | 5.79 |
| aot_nvfuser | 20.37 | 9.44 | 49.20 |
| inductor | 131.38 | 102.01 | 213.14 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 0.99x |
| aot_eager | 0.86x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.0x | 0.25x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.84x | 0.77x | 0.95x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 94%, 50/53 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 94%, 50/53 | 98%, 42/43 | 90%, 55/61 |
| aot_cudagraphs | 26%, 14/53 | 0%, 0/43 | 11%, 7/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/43 | 75%, 46/61 |
| inductor | 83%, 44/53 | 93%, 40/43 | 93%, 57/61 |
+----------------+------------+-------------+-------------+
Geometric mean speedup
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 0.0x | 1.00x |
| aot_nvfuser | 1.16x | 0.0x | 1.20x |
| inductor | 1.84x | 2.29x | 1.55x |
+----------------+------------+-------------+-------------+
Mean compilation time (seconds)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 1.94 | 2.55 | 2.30 |
| aot_eager | 8.04 | 12.73 | 11.51 |
| aot_cudagraphs | 6.98 | 0.0 | 52.51 |
| aot_nvfuser | 27.44 | 0.0 | 71.07 |
| inductor | 139.38 | 117.39 | 262.93 |
+----------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+----------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 0.99x |
| aot_eager | 0.85x | 0.87x | 0.87x |
| aot_cudagraphs | 0.43x | 0.0x | 0.20x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.83x | 0.86x | 0.94x |
+----------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs | 84%, 47/56 | 91%, 40/44 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs | 1.16x | 1.19x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs | 57.71 | 46.53 | 79.81 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs | 0.93x | 0.94x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 89%, 49/55 | 98%, 43/44 | 100%, 61/61 |
| aot_eager | 87%, 48/55 | 98%, 43/44 | 90%, 55/61 |
| aot_cudagraphs | 73%, 40/55 | 57%, 25/44 | 56%, 34/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/44 | 82%, 50/61 |
| inductor | 87%, 48/55 | 93%, 41/44 | 97%, 59/61 |
| inductor_no_cudagraphs | 89%, 49/55 | 93%, 41/44 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.02x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.14x | 1.07x |
| aot_nvfuser | 1.13x | 1.12x | 1.12x |
| inductor | 1.49x | 1.64x | 1.34x |
| inductor_no_cudagraphs | 1.23x | 1.32x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.73 | 2.18 | 1.87 |
| aot_eager | 6.15 | 9.08 | 8.21 |
| aot_cudagraphs | 6.35 | 11.31 | 16.66 |
| aot_nvfuser | 20.10 | 9.46 | 48.56 |
| inductor | 58.49 | 50.41 | 80.71 |
| inductor_no_cudagraphs | 25.61 | 23.48 | 27.66 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 0.99x |
| aot_eager | 0.86x | 0.89x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.84x | 0.77x | 0.95x |
| inductor_no_cudagraphs | 0.98x | 0.95x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 94%, 50/53 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 94%, 50/53 | 98%, 42/43 | 90%, 55/61 |
| aot_cudagraphs | 74%, 39/53 | 53%, 23/43 | 75%, 46/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/43 | 75%, 46/61 |
| inductor | 85%, 45/53 | 93%, 40/43 | 93%, 57/61 |
| inductor_no_cudagraphs | 87%, 46/53 | 93%, 40/43 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.19x | 1.29x | 1.06x |
| aot_nvfuser | 1.16x | 0.0x | 1.20x |
| inductor | 1.84x | 2.30x | 1.56x |
| inductor_no_cudagraphs | 1.37x | 1.64x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.86 | 2.54 | 2.09 |
| aot_eager | 7.63 | 12.68 | 10.35 |
| aot_cudagraphs | 7.76 | 16.14 | 19.75 |
| aot_nvfuser | 26.75 | 0.0 | 69.96 |
| inductor | 56.04 | 56.96 | 94.80 |
| inductor_no_cudagraphs | 28.04 | 29.28 | 32.82 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.98x | 0.99x |
| aot_eager | 0.85x | 0.87x | 0.87x |
| aot_cudagraphs | 0.42x | 0.40x | 0.33x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.83x | 0.86x | 0.94x |
| inductor_no_cudagraphs | 1.00x | 1.05x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 89%, 49/55 | 98%, 42/43 | 100%, 61/61 |
| aot_eager | 89%, 49/55 | 98%, 42/43 | 97%, 59/61 |
| aot_cudagraphs | 73%, 40/55 | 49%, 21/43 | 38%, 23/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/43 | 87%, 53/61 |
| inductor | 85%, 47/55 | 93%, 40/43 | 97%, 59/61 |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.50x | 1.31x | 1.26x |
| inductor_no_cudagraphs | 1.23x | 1.21x | 1.25x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.78 | 2.15 | 1.94 |
| aot_eager | 6.47 | 9.29 | 9.22 |
| aot_cudagraphs | 6.79 | 12.09 | 16.48 |
| aot_nvfuser | 20.61 | 9.84 | 51.45 |
| inductor | 62.14 | 53.79 | 73.53 |
| inductor_no_cudagraphs | 61.41 | 48.85 | 72.55 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.35x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.84x | 0.79x | 0.96x |
| inductor_no_cudagraphs | 0.93x | 0.96x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 94%, 50/53 | 98%, 41/42 | 100%, 61/61 |
| aot_eager | 94%, 50/53 | 98%, 41/42 | 95%, 58/61 |
| aot_cudagraphs | 74%, 39/53 | 60%, 25/42 | 79%, 48/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/42 | 80%, 49/61 |
| inductor | 85%, 45/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 87%, 46/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.21x | 1.05x | 1.00x |
| aot_nvfuser | 1.17x | 0.0x | 1.19x |
| inductor | 1.84x | 1.76x | 1.41x |
| inductor_no_cudagraphs | 1.38x | 1.54x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.95 | 2.59 | 2.20 |
| aot_eager | 8.21 | 13.14 | 11.44 |
| aot_cudagraphs | 8.47 | 16.12 | 21.40 |
| aot_nvfuser | 27.30 | 0.0 | 72.96 |
| inductor | 59.25 | 62.06 | 90.07 |
| inductor_no_cudagraphs | 60.72 | 56.93 | 87.93 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.87x |
| aot_cudagraphs | 0.42x | 0.38x | 0.32x |
| aot_nvfuser | 0.83x | 0.0x | 0.85x |
| inductor | 0.83x | 0.91x | 0.95x |
| inductor_no_cudagraphs | 0.93x | 1.08x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 95%, 52/55 | 100%, 43/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 40/55 | 47%, 20/43 | 39%, 24/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/43 | 89%, 54/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.48x | 1.28x | 1.25x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.08 | 2.22 | 1.88 |
| aot_eager | 6.92 | 9.05 | 8.70 |
| aot_cudagraphs | 8.23 | 18.64 | 15.25 |
| aot_nvfuser | 20.32 | 9.60 | 50.01 |
| inductor | 62.17 | 52.98 | 73.89 |
| inductor_no_cudagraphs | 64.61 | 49.17 | 72.74 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.82x | 0.72x | 0.97x |
| inductor_no_cudagraphs | 0.94x | 0.96x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 100%, 42/42 | 97%, 59/61 |
| aot_cudagraphs | 75%, 40/53 | 55%, 23/42 | 80%, 49/61 |
| aot_nvfuser | 60%, 32/53 | 0%, 0/42 | 87%, 53/61 |
| inductor | 87%, 46/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 89%, 47/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.19x | 1.05x | 1.00x |
| aot_nvfuser | 1.16x | 0.0x | 1.18x |
| inductor | 1.82x | 1.79x | 1.42x |
| inductor_no_cudagraphs | 1.36x | 1.54x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.29 | 2.65 | 2.11 |
| aot_eager | 8.47 | 12.63 | 11.01 |
| aot_cudagraphs | 10.99 | 21.63 | 20.31 |
| aot_nvfuser | 26.97 | 0.0 | 68.40 |
| inductor | 57.44 | 62.79 | 89.06 |
| inductor_no_cudagraphs | 60.44 | 57.49 | 87.16 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.87x |
| aot_cudagraphs | 0.42x | 0.38x | 0.32x |
| aot_nvfuser | 0.83x | 0.0x | 0.84x |
| inductor | 0.83x | 0.91x | 0.95x |
| inductor_no_cudagraphs | 0.92x | 1.08x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 82%, 46/56 | 100%, 43/43 | 59%, 36/61 |
| aot_eager | 79%, 44/56 | 100%, 43/43 | 56%, 34/61 |
| aot_cudagraphs | 64%, 36/56 | 49%, 21/43 | 11%, 7/61 |
| nvprims_nvfuser | 48%, 27/56 | 0%, 0/43 | 15%, 9/61 |
| inductor | 71%, 40/56 | 93%, 40/43 | 56%, 34/61 |
| inductor_no_cudagraphs | 79%, 44/56 | 93%, 40/43 | 56%, 34/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.05x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.04x | 0.0x | 1.16x |
| inductor | 1.39x | 1.29x | 1.23x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.84 | 2.26 | 1.68 |
| aot_eager | 7.30 | 10.27 | 10.56 |
| aot_cudagraphs | 9.57 | 20.71 | 12.53 |
| nvprims_nvfuser | 48.11 | 0.0 | 163.13 |
| inductor | 25.45 | 35.22 | 45.24 |
| inductor_no_cudagraphs | 25.56 | 30.09 | 43.82 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.95x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.90x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.81x | 0.0x | 0.85x |
| inductor | 0.81x | 0.71x | 0.95x |
| inductor_no_cudagraphs | 0.93x | 0.96x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 83%, 45/54 | 100%, 42/42 | 59%, 36/61 |
| aot_eager | 83%, 45/54 | 100%, 42/42 | 54%, 33/61 |
| aot_cudagraphs | 67%, 36/54 | 57%, 24/42 | 38%, 23/61 |
| nvprims_nvfuser | 20%, 11/54 | 5%, 2/42 | 3%, 2/61 |
| inductor | 72%, 39/54 | 93%, 39/42 | 57%, 35/61 |
| inductor_no_cudagraphs | 76%, 41/54 | 93%, 39/42 | 57%, 35/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.06x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.00x | 1.06x |
| inductor | 1.70x | 1.83x | 1.44x |
| inductor_no_cudagraphs | 1.39x | 1.54x | 1.39x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.03 | 2.71 | 1.98 |
| aot_eager | 8.70 | 14.17 | 13.21 |
| aot_cudagraphs | 13.13 | 24.78 | 20.89 |
| nvprims_nvfuser | 27.19 | 67.81 | 156.64 |
| inductor | 28.85 | 42.06 | 54.36 |
| inductor_no_cudagraphs | 29.20 | 36.56 | 52.35 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.95x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.90x |
| aot_cudagraphs | 0.42x | 0.39x | 0.33x |
| nvprims_nvfuser | 0.75x | 0.81x | 0.66x |
| inductor | 0.82x | 0.91x | 0.95x |
| inductor_no_cudagraphs | 0.92x | 1.08x | 1.01x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 85%, 47/55 | 100%, 43/43 | 59%, 36/61 |
| aot_eager | 82%, 45/55 | 100%, 43/43 | 56%, 34/61 |
| aot_cudagraphs | 67%, 37/55 | 49%, 21/43 | 11%, 7/61 |
| nvprims_nvfuser | 49%, 27/55 | 5%, 2/43 | 16%, 10/61 |
| inductor | 75%, 41/55 | 93%, 40/43 | 56%, 34/61 |
| inductor_no_cudagraphs | 80%, 44/55 | 93%, 40/43 | 56%, 34/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.06x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.03x | 1.00x | 1.15x |
| inductor | 1.40x | 1.30x | 1.23x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.22x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.81 | 2.30 | 1.67 |
| aot_eager | 6.93 | 9.94 | 10.22 |
| aot_cudagraphs | 9.22 | 20.51 | 12.16 |
| nvprims_nvfuser | 58.42 | 59.86 | 153.93 |
| inductor | 25.09 | 34.84 | 45.13 |
| inductor_no_cudagraphs | 25.58 | 29.60 | 43.93 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.90x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.82x | 0.82x | 0.86x |
| inductor | 0.81x | 0.71x | 0.96x |
| inductor_no_cudagraphs | 0.97x | 0.96x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 87%, 46/53 | 100%, 42/42 | 59%, 36/61 |
| aot_eager | 85%, 45/53 | 100%, 42/42 | 54%, 33/61 |
| aot_cudagraphs | 70%, 37/53 | 57%, 24/42 | 38%, 23/61 |
| nvprims_nvfuser | 17%, 9/53 | 5%, 2/42 | 2%, 1/61 |
| inductor | 75%, 40/53 | 93%, 39/42 | 57%, 35/61 |
| inductor_no_cudagraphs | 79%, 42/53 | 93%, 39/42 | 57%, 35/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.13x | 1.05x | 1.00x |
| nvprims_nvfuser | 1.00x | 1.00x | 1.00x |
| inductor | 1.74x | 1.79x | 1.43x |
| inductor_no_cudagraphs | 1.38x | 1.54x | 1.39x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.98 | 2.70 | 1.94 |
| aot_eager | 8.38 | 13.52 | 12.71 |
| aot_cudagraphs | 12.66 | 24.30 | 20.37 |
| nvprims_nvfuser | 14.36 | 67.23 | 150.44 |
| inductor | 27.66 | 41.18 | 53.94 |
| inductor_no_cudagraphs | 28.45 | 35.92 | 51.63 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.95x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.90x |
| aot_cudagraphs | 0.42x | 0.39x | 0.33x |
| nvprims_nvfuser | 0.75x | 0.81x | 0.52x |
| inductor | 0.82x | 0.91x | 0.95x |
| inductor_no_cudagraphs | 0.96x | 1.08x | 1.04x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 53/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 93%, 51/55 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 75%, 41/55 | 49%, 21/43 | 38%, 23/61 |
| nvprims_nvfuser | 73%, 40/55 | 16%, 7/43 | 48%, 29/61 |
| inductor | 85%, 47/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.06x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.03x | 1.00x | 1.15x |
| inductor | 1.42x | 1.31x | 1.25x |
| inductor_no_cudagraphs | 1.25x | 1.23x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.18 | 2.29 | 1.91 |
| aot_eager | 5.85 | 7.55 | 7.04 |
| aot_cudagraphs | 7.64 | 16.05 | 13.24 |
| nvprims_nvfuser | 77.58 | 133.12 | 149.39 |
| inductor | 33.95 | 32.13 | 38.53 |
| inductor_no_cudagraphs | 33.13 | 27.28 | 37.01 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| nvprims_nvfuser | 0.81x | 0.83x | 0.81x |
| inductor | 0.84x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 0.99x | 0.97x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 77%, 41/53 | 60%, 25/42 | 79%, 48/61 |
| nvprims_nvfuser | 49%, 26/53 | 12%, 5/42 | 33%, 20/61 |
| inductor | 87%, 46/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 48/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.05x | 1.00x |
| nvprims_nvfuser | 1.03x | 1.00x | 1.10x |
| inductor | 1.69x | 1.76x | 1.40x |
| inductor_no_cudagraphs | 1.39x | 1.54x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.35 | 2.71 | 2.12 |
| aot_eager | 6.89 | 10.10 | 8.51 |
| aot_cudagraphs | 10.32 | 17.98 | 16.82 |
| nvprims_nvfuser | 65.64 | 126.51 | 163.37 |
| inductor | 34.12 | 36.94 | 44.51 |
| inductor_no_cudagraphs | 33.99 | 32.16 | 42.70 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.39x | 0.32x |
| nvprims_nvfuser | 0.75x | 0.79x | 0.69x |
| inductor | 0.84x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.97x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 53/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 93%, 51/55 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 75%, 41/55 | 49%, 21/43 | 38%, 23/61 |
| nvprims_nvfuser | 71%, 39/55 | 16%, 7/43 | 49%, 30/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.05x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.03x | 1.00x | 1.15x |
| inductor | 1.42x | 1.30x | 1.25x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.16 | 2.29 | 1.92 |
| aot_eager | 5.89 | 7.59 | 7.04 |
| aot_cudagraphs | 7.58 | 16.01 | 13.16 |
| nvprims_nvfuser | 75.24 | 135.18 | 186.40 |
| inductor | 33.55 | 31.52 | 38.36 |
| inductor_no_cudagraphs | 33.35 | 27.46 | 37.08 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| nvprims_nvfuser | 0.81x | 0.83x | 0.82x |
| inductor | 0.83x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 0.99x | 0.97x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 77%, 41/53 | 60%, 25/42 | 79%, 48/61 |
| nvprims_nvfuser | 51%, 27/53 | 12%, 5/42 | 34%, 21/61 |
| inductor | 85%, 45/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 48/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.05x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.00x | 1.11x |
| inductor | 1.70x | 1.81x | 1.40x |
| inductor_no_cudagraphs | 1.39x | 1.54x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.36 | 2.73 | 2.13 |
| aot_eager | 6.94 | 10.08 | 8.49 |
| aot_cudagraphs | 10.31 | 18.21 | 16.74 |
| nvprims_nvfuser | 68.70 | 131.09 | 160.24 |
| inductor | 34.52 | 37.14 | 44.56 |
| inductor_no_cudagraphs | 34.02 | 32.54 | 42.93 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.39x | 0.32x |
| nvprims_nvfuser | 0.75x | 0.79x | 0.69x |
| inductor | 0.84x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.97x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 95%, 52/55 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 75%, 41/55 | 49%, 21/43 | 38%, 23/61 |
| nvprims_nvfuser | 71%, 39/55 | 16%, 7/43 | 48%, 29/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.05x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.03x | 1.00x | 1.14x |
| inductor | 1.41x | 1.30x | 1.25x |
| inductor_no_cudagraphs | 1.25x | 1.23x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.10 | 2.28 | 1.90 |
| aot_eager | 5.80 | 7.42 | 6.95 |
| aot_cudagraphs | 7.56 | 15.92 | 13.13 |
| nvprims_nvfuser | 75.23 | 133.60 | 150.37 |
| inductor | 28.79 | 29.22 | 34.02 |
| inductor_no_cudagraphs | 28.59 | 24.81 | 32.58 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| nvprims_nvfuser | 0.81x | 0.83x | 0.81x |
| inductor | 0.84x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 0.99x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 52/53 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 77%, 41/53 | 60%, 25/42 | 79%, 48/61 |
| nvprims_nvfuser | 51%, 27/53 | 12%, 5/42 | 33%, 20/61 |
| inductor | 87%, 46/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 48/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.07x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.00x | 1.11x |
| inductor | 1.69x | 1.80x | 1.40x |
| inductor_no_cudagraphs | 1.39x | 1.55x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.32 | 2.74 | 2.15 |
| aot_eager | 6.85 | 10.15 | 8.44 |
| aot_cudagraphs | 10.30 | 18.47 | 16.72 |
| nvprims_nvfuser | 67.21 | 130.37 | 159.28 |
| inductor | 30.35 | 34.63 | 40.31 |
| inductor_no_cudagraphs | 30.84 | 29.62 | 38.38 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.89x | 0.87x |
| aot_cudagraphs | 0.42x | 0.39x | 0.32x |
| nvprims_nvfuser | 0.75x | 0.79x | 0.70x |
| inductor | 0.84x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.97x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 53/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 96%, 53/55 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 82%, 45/55 | 77%, 33/43 | 44%, 27/61 |
| nvprims_nvfuser | 55%, 30/55 | 93%, 40/43 | 31%, 19/61 |
| inductor | 85%, 47/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.03x | 1.11x |
| inductor | 1.50x | 1.29x | 1.25x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.16 | 2.43 | 1.91 |
| aot_eager | 5.77 | 7.84 | 7.05 |
| aot_cudagraphs | 8.60 | 16.10 | 13.16 |
| nvprims_nvfuser | 73.63 | 109.11 | 124.35 |
| inductor | 29.31 | 29.54 | 34.71 |
| inductor_no_cudagraphs | 28.61 | 25.45 | 33.28 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.85x | 0.87x | 0.84x |
| inductor | 0.87x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 1.01x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 51/53 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 52/53 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 89%, 47/53 | 90%, 38/42 | 90%, 55/61 |
| nvprims_nvfuser | 55%, 29/53 | 93%, 39/42 | 28%, 17/61 |
| inductor | 85%, 45/53 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 89%, 47/53 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.10x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.04x | 1.07x |
| inductor | 1.91x | 1.79x | 1.41x |
| inductor_no_cudagraphs | 1.37x | 1.55x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.40 | 2.86 | 2.15 |
| aot_eager | 6.95 | 10.36 | 8.66 |
| aot_cudagraphs | 11.15 | 18.97 | 15.99 |
| nvprims_nvfuser | 93.96 | 144.92 | 150.66 |
| inductor | 32.77 | 34.87 | 40.37 |
| inductor_no_cudagraphs | 32.11 | 30.29 | 38.68 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.90x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.80x | 0.78x | 0.76x |
| inductor | 0.85x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.96x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 54/56 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 96%, 54/56 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 82%, 46/56 | 77%, 33/43 | 44%, 27/61 |
| nvprims_nvfuser | 82%, 46/56 | 60%, 26/43 | 67%, 41/61 |
| inductor | 86%, 48/56 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 93%, 52/56 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.05x | 1.00x |
| nvprims_nvfuser | 1.05x | 1.03x | 1.14x |
| inductor | 1.50x | 1.29x | 1.24x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.15 | 2.42 | 1.92 |
| aot_eager | 5.82 | 7.70 | 7.04 |
| aot_cudagraphs | 8.66 | 15.96 | 13.09 |
| nvprims_nvfuser | 79.49 | 131.82 | 152.15 |
| inductor | 29.49 | 29.75 | 34.69 |
| inductor_no_cudagraphs | 28.94 | 25.44 | 33.11 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.92x | 1.00x | 0.93x |
| inductor | 0.87x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 1.01x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 53/54 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 89%, 48/54 | 90%, 38/42 | 90%, 55/61 |
| nvprims_nvfuser | 59%, 32/54 | 12%, 5/42 | 54%, 33/61 |
| inductor | 85%, 46/54 | 93%, 39/42 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 49/54 | 93%, 39/42 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.21x | 1.11x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.01x | 1.08x |
| inductor | 1.89x | 1.77x | 1.41x |
| inductor_no_cudagraphs | 1.36x | 1.54x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.41 | 2.86 | 2.14 |
| aot_eager | 7.01 | 10.27 | 8.57 |
| aot_cudagraphs | 11.24 | 19.07 | 15.95 |
| nvprims_nvfuser | 70.53 | 158.06 | 147.88 |
| inductor | 32.72 | 34.86 | 39.84 |
| inductor_no_cudagraphs | 32.13 | 30.21 | 38.38 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.90x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.84x | 1.06x | 0.85x |
| inductor | 0.85x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.96x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
@williamwen42 does all these mean that torchdynamo (almost all of its backends) does not get you any significant speedups? From the benchmarks, it looks like even nvFuser does not really help much in terms of speedups
We mainly focus on speedups from the inductor/inductor_no_cudagraphs backends. The geometric mean speedup summary tables show significant speedups for these backends.
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 95%, 53/56 | 98%, 42/43 | 98%, 60/61 |
| aot_eager | 91%, 51/56 | 95%, 41/43 | 97%, 59/61 |
| aot_cudagraphs | 79%, 44/56 | 72%, 31/43 | 46%, 28/61 |
| nvprims_nvfuser | 80%, 45/56 | 60%, 26/43 | 67%, 41/61 |
| inductor | 86%, 48/56 | 77%, 33/43 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 51/56 | 91%, 39/43 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.05x | 1.03x | 1.14x |
| inductor | 1.49x | 1.29x | 1.23x |
| inductor_no_cudagraphs | 1.22x | 1.20x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.85 | 2.33 | 2.06 |
| aot_eager | 5.34 | 7.45 | 7.02 |
| aot_cudagraphs | 7.29 | 14.25 | 13.27 |
| nvprims_nvfuser | 65.48 | 106.37 | 149.42 |
| inductor | 30.42 | 34.17 | 37.33 |
| inductor_no_cudagraphs | 30.20 | 27.71 | 35.70 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.90x | 1.00x | 0.95x |
| inductor | 0.81x | 0.66x | 0.97x |
| inductor_no_cudagraphs | 0.96x | 0.88x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 95%, 58/61 |
| aot_cudagraphs | 85%, 46/54 | 86%, 36/42 | 90%, 55/61 |
| nvprims_nvfuser | 59%, 32/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 83%, 45/54 | 90%, 38/42 | 92%, 56/61 |
| inductor_no_cudagraphs | 89%, 48/54 | 90%, 38/42 | 92%, 56/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.13x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.04x | 1.09x |
| inductor | 1.87x | 1.73x | 1.40x |
| inductor_no_cudagraphs | 1.37x | 1.52x | 1.35x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.02 | 2.77 | 2.27 |
| aot_eager | 6.41 | 9.89 | 8.53 |
| aot_cudagraphs | 9.49 | 17.59 | 16.11 |
| nvprims_nvfuser | 66.76 | 133.58 | 148.52 |
| inductor | 32.71 | 37.94 | 43.00 |
| inductor_no_cudagraphs | 32.54 | 33.20 | 41.01 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.84x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.82x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.94x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 95%, 53/56 | 98%, 42/43 | 98%, 60/61 |
| aot_eager | 93%, 52/56 | 95%, 41/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 41/56 | 72%, 31/43 | 46%, 28/61 |
| nvprims_nvfuser | 77%, 43/56 | 60%, 26/43 | 67%, 41/61 |
| inductor | 84%, 47/56 | 91%, 39/43 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 51/56 | 91%, 39/43 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.05x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.02x | 1.14x |
| inductor | 1.53x | 1.30x | 1.25x |
| inductor_no_cudagraphs | 1.23x | 1.23x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.86 | 2.34 | 2.06 |
| aot_eager | 5.45 | 7.34 | 7.23 |
| aot_cudagraphs | 7.54 | 14.38 | 13.21 |
| nvprims_nvfuser | 63.74 | 98.63 | 148.48 |
| inductor | 30.44 | 30.55 | 37.13 |
| inductor_no_cudagraphs | 29.67 | 26.28 | 35.39 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.90x | 1.00x | 0.95x |
| inductor | 0.83x | 0.71x | 0.98x |
| inductor_no_cudagraphs | 0.97x | 0.97x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 97%, 59/61 |
| aot_cudagraphs | 80%, 43/54 | 86%, 36/42 | 90%, 55/61 |
| nvprims_nvfuser | 56%, 30/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 83%, 45/54 | 90%, 38/42 | 92%, 56/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 90%, 38/42 | 92%, 56/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.24x | 1.11x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.04x | 1.09x |
| inductor | 1.90x | 1.82x | 1.42x |
| inductor_no_cudagraphs | 1.38x | 1.57x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.04 | 2.79 | 2.31 |
| aot_eager | 6.62 | 10.06 | 8.74 |
| aot_cudagraphs | 9.85 | 17.73 | 16.18 |
| nvprims_nvfuser | 63.52 | 113.44 | 148.51 |
| inductor | 32.87 | 35.84 | 43.12 |
| inductor_no_cudagraphs | 32.63 | 31.50 | 41.23 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.83x | 0.89x | 0.88x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.83x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.95x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
We changed the batch sizes and sequence lengths of HF models to more accurately represent these models. This dashboard run is a one-off experiment to get the new speedups.
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+-------------+
| Compiler | huggingface |
+------------------------+-------------+
| eager | 93%, 43/46 |
| inductor | 83%, 38/46 |
| inductor_no_cudagraphs | 85%, 39/46 |
+------------------------+-------------+
Geometric mean speedup
+------------------------+-------------+
| Compiler | huggingface |
+------------------------+-------------+
| eager | 1.00x |
| inductor | 1.56x |
| inductor_no_cudagraphs | 1.51x |
+------------------------+-------------+
Mean compilation time (seconds)
+------------------------+-------------+
| Compiler | huggingface |
+------------------------+-------------+
| eager | 2.98 |
| inductor | 38.38 |
| inductor_no_cudagraphs | 33.29 |
+------------------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+-------------+
| Compiler | huggingface |
+------------------------+-------------+
| eager | 1.00x |
| inductor | 0.92x |
| inductor_no_cudagraphs | 1.07x |
+------------------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 95%, 53/56 | 98%, 42/43 | 98%, 60/61 |
| aot_eager | 91%, 51/56 | 95%, 41/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 41/56 | 72%, 31/43 | 46%, 28/61 |
| nvprims_nvfuser | 75%, 42/56 | 60%, 26/43 | 67%, 41/61 |
| inductor | 84%, 47/56 | 91%, 39/43 | 93%, 57/61 |
| inductor_no_cudagraphs | 91%, 51/56 | 91%, 39/43 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.02x | 1.14x |
| inductor | 1.52x | 1.30x | 1.24x |
| inductor_no_cudagraphs | 1.23x | 1.23x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.85 | 2.37 | 2.02 |
| aot_eager | 5.54 | 7.49 | 7.17 |
| aot_cudagraphs | 7.51 | 14.10 | 13.21 |
| nvprims_nvfuser | 65.08 | 98.89 | 147.85 |
| inductor | 30.45 | 30.45 | 37.25 |
| inductor_no_cudagraphs | 29.58 | 26.36 | 35.77 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.91x | 1.00x | 0.95x |
| inductor | 0.83x | 0.71x | 0.98x |
| inductor_no_cudagraphs | 0.97x | 0.97x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 97%, 59/61 |
| aot_cudagraphs | 80%, 43/54 | 86%, 36/42 | 90%, 55/61 |
| nvprims_nvfuser | 56%, 30/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 83%, 45/54 | 90%, 38/42 | 92%, 56/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 90%, 38/42 | 92%, 56/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.24x | 1.10x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.04x | 1.08x |
| inductor | 1.89x | 1.81x | 1.43x |
| inductor_no_cudagraphs | 1.38x | 1.57x | 1.37x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.02 | 2.81 | 2.27 |
| aot_eager | 6.63 | 9.97 | 8.72 |
| aot_cudagraphs | 9.77 | 17.33 | 16.24 |
| nvprims_nvfuser | 63.87 | 113.26 | 148.67 |
| inductor | 32.66 | 35.79 | 43.33 |
| inductor_no_cudagraphs | 32.43 | 31.42 | 41.22 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.83x | 0.89x | 0.88x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.83x | 0.88x | 0.95x |
| inductor_no_cudagraphs | 0.95x | 1.05x | 1.06x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 95%, 53/56 | 91%, 43/47 | 98%, 60/61 |
| aot_eager | 91%, 51/56 | 91%, 43/47 | 98%, 60/61 |
| aot_cudagraphs | 73%, 41/56 | 34%, 16/47 | 46%, 28/61 |
| nvprims_nvfuser | 75%, 42/56 | 57%, 27/47 | 67%, 41/61 |
| inductor | 84%, 47/56 | 83%, 39/47 | 93%, 57/61 |
| inductor_no_cudagraphs | 89%, 50/56 | 87%, 41/47 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.00x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.04x | 1.14x |
| inductor | 1.47x | 1.25x | 1.23x |
| inductor_no_cudagraphs | 1.23x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.84 | 2.48 | 2.00 |
| aot_eager | 5.51 | 7.97 | 7.17 |
| aot_cudagraphs | 7.42 | 15.41 | 13.11 |
| nvprims_nvfuser | 64.72 | 100.64 | 148.57 |
| inductor | 30.09 | 33.07 | 35.96 |
| inductor_no_cudagraphs | 29.79 | 29.18 | 34.75 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.86x | 0.93x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.91x | 1.02x | 0.95x |
| inductor | 0.83x | 0.75x | 0.98x |
| inductor_no_cudagraphs | 0.98x | 1.00x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 54/56 | 91%, 43/47 | 98%, 60/61 |
| aot_eager | 91%, 51/56 | 91%, 43/47 | 98%, 60/61 |
| aot_cudagraphs | 73%, 41/56 | 34%, 16/47 | 46%, 28/61 |
| nvprims_nvfuser | 75%, 42/56 | 57%, 27/47 | 67%, 41/61 |
| inductor | 82%, 46/56 | 83%, 39/47 | 93%, 57/61 |
| inductor_no_cudagraphs | 89%, 50/56 | 87%, 41/47 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.00x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.00x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.04x | 1.14x |
| inductor | 1.47x | 1.23x | 1.23x |
| inductor_no_cudagraphs | 1.23x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.80 | 2.47 | 2.00 |
| aot_eager | 5.51 | 7.78 | 7.13 |
| aot_cudagraphs | 7.44 | 15.39 | 13.04 |
| nvprims_nvfuser | 65.48 | 100.68 | 147.36 |
| inductor | 30.47 | 33.00 | 35.82 |
| inductor_no_cudagraphs | 29.78 | 28.80 | 34.59 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.86x | 0.92x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.91x | 1.02x | 0.95x |
| inductor | 0.84x | 0.75x | 0.98x |
| inductor_no_cudagraphs | 0.98x | 1.00x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 91%, 42/46 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 91%, 42/46 | 97%, 59/61 |
| aot_cudagraphs | 80%, 43/54 | 70%, 32/46 | 90%, 55/61 |
| nvprims_nvfuser | 56%, 30/54 | 7%, 3/46 | 52%, 32/61 |
| inductor | 81%, 44/54 | 83%, 38/46 | 87%, 53/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 85%, 39/46 | 89%, 54/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.23x | 1.01x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.11x | 1.09x |
| inductor | 1.65x | 1.64x | 1.18x |
| inductor_no_cudagraphs | 1.28x | 1.57x | 1.15x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.01 | 2.92 | 2.21 |
| aot_eager | 6.54 | 10.47 | 8.63 |
| aot_cudagraphs | 9.63 | 17.35 | 15.90 |
| nvprims_nvfuser | 63.02 | 113.69 | 147.66 |
| inductor | 72.95 | 36.82 | 76.16 |
| inductor_no_cudagraphs | 68.94 | 33.11 | 74.29 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.83x | 0.91x | 0.88x |
| aot_cudagraphs | 0.41x | 0.37x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.07x | 0.86x |
| inductor | 0.78x | 0.92x | 0.88x |
| inductor_no_cudagraphs | 0.92x | 1.07x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 91%, 42/46 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 91%, 42/46 | 97%, 59/61 |
| aot_cudagraphs | 80%, 43/54 | 70%, 32/46 | 90%, 55/61 |
| nvprims_nvfuser | 56%, 30/54 | 7%, 3/46 | 52%, 32/61 |
| inductor | 81%, 44/54 | 83%, 38/46 | 89%, 54/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 85%, 39/46 | 89%, 54/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.24x | 1.01x | 1.00x |
| nvprims_nvfuser | 1.01x | 1.10x | 1.09x |
| inductor | 1.64x | 1.64x | 1.17x |
| inductor_no_cudagraphs | 1.28x | 1.56x | 1.15x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.02 | 2.94 | 2.27 |
| aot_eager | 6.61 | 10.53 | 8.70 |
| aot_cudagraphs | 9.82 | 17.39 | 16.17 |
| nvprims_nvfuser | 63.40 | 117.61 | 147.41 |
| inductor | 73.45 | 37.78 | 76.47 |
| inductor_no_cudagraphs | 70.11 | 34.36 | 74.63 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.83x | 0.91x | 0.88x |
| aot_cudagraphs | 0.41x | 0.37x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.07x | 0.86x |
| inductor | 0.78x | 0.92x | 0.88x |
| inductor_no_cudagraphs | 0.92x | 1.07x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 54/56 | 98%, 45/46 | 98%, 60/61 |
| aot_eager | 93%, 52/56 | 98%, 45/46 | 98%, 60/61 |
| aot_cudagraphs | 73%, 41/56 | 35%, 16/46 | 46%, 28/61 |
| nvprims_nvfuser | 77%, 43/56 | 61%, 28/46 | 67%, 41/61 |
| inductor | 84%, 47/56 | 87%, 40/46 | 93%, 57/61 |
| inductor_no_cudagraphs | 89%, 50/56 | 93%, 43/46 | 93%, 57/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.00x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.00x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.04x | 1.14x |
| inductor | 1.47x | 1.23x | 1.23x |
| inductor_no_cudagraphs | 1.23x | 1.21x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.81 | 2.58 | 2.03 |
| aot_eager | 6.03 | 8.93 | 8.01 |
| aot_cudagraphs | 7.83 | 16.30 | 13.76 |
| nvprims_nvfuser | 61.87 | 89.81 | 140.94 |
| inductor | 32.84 | 36.18 | 37.32 |
| inductor_no_cudagraphs | 32.77 | 30.82 | 36.22 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.86x | 0.92x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.90x | 1.01x | 0.95x |
| inductor | 0.83x | 0.74x | 0.97x |
| inductor_no_cudagraphs | 0.99x | 1.00x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 44/45 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 96%, 43/45 | 97%, 59/61 |
| aot_cudagraphs | 80%, 43/54 | 73%, 33/45 | 90%, 55/61 |
| nvprims_nvfuser | 56%, 30/54 | 7%, 3/45 | 52%, 32/61 |
| inductor | 81%, 44/54 | 87%, 39/45 | 89%, 54/61 |
| inductor_no_cudagraphs | 85%, 46/54 | 91%, 41/45 | 89%, 54/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.00x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.24x | 1.02x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.10x | 1.09x |
| inductor | 1.66x | 1.62x | 1.17x |
| inductor_no_cudagraphs | 1.29x | 1.54x | 1.15x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.03 | 3.09 | 2.28 |
| aot_eager | 7.33 | 11.78 | 9.69 |
| aot_cudagraphs | 10.29 | 20.14 | 16.96 |
| nvprims_nvfuser | 61.99 | 99.19 | 144.26 |
| inductor | 76.19 | 41.09 | 77.56 |
| inductor_no_cudagraphs | 74.25 | 36.23 | 75.97 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 1.00x | 0.99x |
| aot_eager | 0.83x | 0.91x | 0.88x |
| aot_cudagraphs | 0.41x | 0.37x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.07x | 0.86x |
| inductor | 0.78x | 0.92x | 0.88x |
| inductor_no_cudagraphs | 0.93x | 1.07x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 55/56 | 100%, 46/46 | 100%, 61/61 |
| aot_eager | 95%, 53/56 | 100%, 46/46 | 100%, 61/61 |
| aot_cudagraphs | 75%, 42/56 | 35%, 16/46 | 46%, 28/61 |
| nvprims_nvfuser | 77%, 43/56 | 61%, 28/46 | 67%, 41/61 |
| inductor | 84%, 47/56 | 85%, 39/46 | 95%, 58/61 |
| inductor_no_cudagraphs | 89%, 50/56 | 93%, 43/46 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.00x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.00x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.04x | 1.14x |
| inductor | 1.46x | 1.23x | 1.23x |
| inductor_no_cudagraphs | 1.23x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.71 | 2.81 | 2.02 |
| aot_eager | 5.86 | 8.96 | 7.76 |
| aot_cudagraphs | 8.49 | 16.03 | 13.40 |
| nvprims_nvfuser | 60.36 | 87.50 | 139.45 |
| inductor | 32.95 | 36.69 | 37.10 |
| inductor_no_cudagraphs | 32.72 | 31.66 | 36.09 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.92x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.90x | 1.01x | 0.95x |
| inductor | 0.83x | 0.74x | 0.97x |
| inductor_no_cudagraphs | 0.99x | 1.00x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 53/54 | 100%, 45/45 | 100%, 61/61 |
| aot_eager | 96%, 52/54 | 98%, 44/45 | 98%, 60/61 |
| aot_cudagraphs | 81%, 44/54 | 76%, 34/45 | 92%, 56/61 |
| nvprims_nvfuser | 56%, 30/54 | 7%, 3/45 | 54%, 33/61 |
| inductor | 81%, 44/54 | 87%, 39/45 | 89%, 54/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 91%, 41/45 | 89%, 54/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.00x | 1.00x | 1.00x |
| aot_cudagraphs | 1.24x | 1.01x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.09x | 1.09x |
| inductor | 1.67x | 1.62x | 1.19x |
| inductor_no_cudagraphs | 1.28x | 1.53x | 1.16x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.94 | 3.32 | 2.25 |
| aot_eager | 7.13 | 11.72 | 9.33 |
| aot_cudagraphs | 10.71 | 20.92 | 16.35 |
| nvprims_nvfuser | 60.33 | 96.80 | 141.90 |
| inductor | 75.58 | 43.75 | 76.79 |
| inductor_no_cudagraphs | 72.08 | 36.82 | 74.83 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 1.00x | 0.99x |
| aot_eager | 0.84x | 0.91x | 0.88x |
| aot_cudagraphs | 0.41x | 0.37x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.07x | 0.87x |
| inductor | 0.78x | 0.92x | 0.88x |
| inductor_no_cudagraphs | 0.92x | 1.07x | 1.03x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 55/56 | 100%, 46/46 | 100%, 61/61 |
| aot_eager | 95%, 53/56 | 100%, 46/46 | 100%, 61/61 |
| aot_cudagraphs | 75%, 42/56 | 37%, 17/46 | 46%, 28/61 |
| nvprims_nvfuser | 77%, 43/56 | 61%, 28/46 | 67%, 41/61 |
| inductor | 84%, 47/56 | 85%, 39/46 | 95%, 58/61 |
| inductor_no_cudagraphs | 89%, 50/56 | 93%, 43/46 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.01x | 1.00x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.12x | 1.00x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.04x | 1.14x |
| inductor | 1.47x | 1.23x | 1.23x |
| inductor_no_cudagraphs | 1.24x | 1.21x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.74 | 2.79 | 2.02 |
| aot_eager | 5.90 | 8.93 | 7.71 |
| aot_cudagraphs | 8.52 | 16.14 | 13.43 |
| nvprims_nvfuser | 59.77 | 86.65 | 139.56 |
| inductor | 32.89 | 37.33 | 37.13 |
| inductor_no_cudagraphs | 32.84 | 31.43 | 36.02 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.98x | 0.99x | 0.99x |
| aot_eager | 0.87x | 0.92x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.90x | 1.01x | 0.95x |
| inductor | 0.83x | 0.74x | 0.97x |
| inductor_no_cudagraphs | 0.99x | 1.00x | 1.09x |
+------------------------+------------+-------------+-------------+
Dashboard to track the performance of different backends.
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire