Closed williamwen42 closed 2 years ago
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 95%, 52/55 | 100%, 43/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 40/55 | 47%, 20/43 | 39%, 24/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/43 | 89%, 54/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.48x | 1.28x | 1.25x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.08 | 2.22 | 1.88 |
| aot_eager | 6.92 | 9.05 | 8.70 |
| aot_cudagraphs | 8.23 | 18.64 | 15.25 |
| aot_nvfuser | 20.32 | 9.60 | 50.01 |
| inductor | 62.17 | 52.98 | 73.89 |
| inductor_no_cudagraphs | 64.61 | 49.17 | 72.74 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.82x | 0.72x | 0.97x |
| inductor_no_cudagraphs | 0.94x | 0.96x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 95%, 52/55 | 100%, 43/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 40/55 | 47%, 20/43 | 39%, 24/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/43 | 89%, 54/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.48x | 1.28x | 1.25x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.08 | 2.22 | 1.88 |
| aot_eager | 6.92 | 9.05 | 8.70 |
| aot_cudagraphs | 8.23 | 18.64 | 15.25 |
| aot_nvfuser | 20.32 | 9.60 | 50.01 |
| inductor | 62.17 | 52.98 | 73.89 |
| inductor_no_cudagraphs | 64.61 | 49.17 | 72.74 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.82x | 0.72x | 0.97x |
| inductor_no_cudagraphs | 0.94x | 0.96x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 95%, 52/55 | 100%, 43/43 | 98%, 60/61 |
| aot_cudagraphs | 73%, 40/55 | 47%, 20/43 | 39%, 24/61 |
| aot_nvfuser | 58%, 32/55 | 2%, 1/43 | 89%, 54/61 |
| inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.48x | 1.28x | 1.25x |
| inductor_no_cudagraphs | 1.22x | 1.21x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.08 | 2.22 | 1.88 |
| aot_eager | 6.92 | 9.05 | 8.70 |
| aot_cudagraphs | 8.23 | 18.64 | 15.25 |
| aot_nvfuser | 20.32 | 9.60 | 50.01 |
| inductor | 62.17 | 52.98 | 73.89 |
| inductor_no_cudagraphs | 64.61 | 49.17 | 72.74 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.82x | 0.72x | 0.97x |
| inductor_no_cudagraphs | 0.94x | 0.96x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 71%, 54/76 | 51%, 43/84 | 76%, 61/80 |
| aot_eager | 70%, 53/76 | 51%, 43/84 | 75%, 60/80 |
| aot_cudagraphs | 53%, 40/76 | 24%, 20/84 | 30%, 24/80 |
| aot_nvfuser | 43%, 33/76 | 1%, 1/84 | 71%, 57/80 |
| inductor | 66%, 50/76 | 50%, 42/84 | 75%, 60/80 |
| inductor_no_cudagraphs | 68%, 52/76 | 50%, 42/84 | 76%, 61/80 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.09x | 1.02x | 1.00x |
| aot_nvfuser | 1.13x | 1.12x | 1.11x |
| inductor | 1.47x | 1.28x | 1.25x |
| inductor_no_cudagraphs | 1.23x | 1.21x | 1.24x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.08 | 2.22 | 1.88 |
| aot_eager | 6.96 | 9.05 | 8.70 |
| aot_cudagraphs | 8.23 | 18.64 | 15.25 |
| aot_nvfuser | 21.02 | 9.60 | 49.80 |
| inductor | 61.02 | 52.88 | 73.59 |
| inductor_no_cudagraphs | 63.42 | 49.20 | 71.75 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.96x | 1.00x | 0.99x |
| aot_eager | 0.86x | 0.91x | 0.88x |
| aot_cudagraphs | 0.39x | 0.36x | 0.32x |
| aot_nvfuser | 0.83x | 1.08x | 0.84x |
| inductor | 0.85x | 0.72x | 0.97x |
| inductor_no_cudagraphs | 0.96x | 0.96x | 1.02x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 100%, 1/1 |
+----------+------------+
Geometric mean speedup
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.42x |
+----------+------------+
Mean compilation time (seconds)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 18.23 |
+----------+------------+
Peak memory footprint compression ratio (higher is better)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.17x |
+----------+------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 100%, 1/1 |
+----------+------------+
Geometric mean speedup
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.42x |
+----------+------------+
Mean compilation time (seconds)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 17.48 |
+----------+------------+
Peak memory footprint compression ratio (higher is better)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.17x |
+----------+------------+
Mean absolute latency (seconds)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 0.05 |
+----------+------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 100%, 1/1 |
+----------+------------+
Geometric mean speedup
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.42x |
+----------+------------+
Mean compilation time (seconds)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 17.28 |
+----------+------------+
Peak memory footprint compression ratio (higher is better)
+----------+------------+
| Compiler | torchbench |
+----------+------------+
| inductor | 1.17x |
+----------+------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 82%, 53/65 | 84%, 43/51 | 82%, 61/74 |
| aot_eager | 83%, 54/65 | 84%, 43/51 | 82%, 61/74 |
| aot_cudagraphs | 69%, 45/65 | 65%, 33/51 | 38%, 28/74 |
| nvprims_nvfuser | 48%, 31/65 | 78%, 40/51 | 26%, 19/74 |
| inductor | 75%, 49/65 | 82%, 42/51 | 81%, 60/74 |
| inductor_no_cudagraphs | 82%, 53/65 | 82%, 42/51 | 82%, 61/74 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.03x | 1.11x |
| inductor | 1.50x | 1.29x | 1.25x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.16 | 2.43 | 1.91 |
| aot_eager | 5.77 | 7.84 | 7.05 |
| aot_cudagraphs | 8.60 | 16.10 | 13.16 |
| nvprims_nvfuser | 73.63 | 109.11 | 124.35 |
| inductor | 29.31 | 29.54 | 34.71 |
| inductor_no_cudagraphs | 28.61 | 25.45 | 33.28 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.85x | 0.87x | 0.84x |
| inductor | 0.87x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 1.01x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 82%, 53/65 | 84%, 43/51 | 82%, 61/74 |
| aot_eager | 83%, 54/65 | 84%, 43/51 | 82%, 61/74 |
| aot_cudagraphs | 69%, 45/65 | 65%, 33/51 | 38%, 28/74 |
| nvprims_nvfuser | 48%, 31/65 | 78%, 40/51 | 26%, 19/74 |
| inductor | 75%, 49/65 | 82%, 42/51 | 81%, 60/74 |
| inductor_no_cudagraphs | 82%, 53/65 | 82%, 42/51 | 82%, 61/74 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.03x | 1.11x |
| inductor | 1.50x | 1.29x | 1.25x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.16 | 2.43 | 1.91 |
| aot_eager | 5.77 | 7.84 | 7.05 |
| aot_cudagraphs | 8.60 | 16.10 | 13.16 |
| nvprims_nvfuser | 73.63 | 109.11 | 124.35 |
| inductor | 29.31 | 29.54 | 34.71 |
| inductor_no_cudagraphs | 28.61 | 25.45 | 33.28 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.85x | 0.87x | 0.84x |
| inductor | 0.87x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 1.01x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 82%, 53/65 | 84%, 43/51 | 82%, 61/74 |
| aot_eager | 83%, 54/65 | 84%, 43/51 | 82%, 61/74 |
| aot_cudagraphs | 69%, 45/65 | 65%, 33/51 | 38%, 28/74 |
| nvprims_nvfuser | 48%, 31/65 | 78%, 40/51 | 26%, 19/74 |
| inductor | 75%, 49/65 | 82%, 42/51 | 81%, 60/74 |
| inductor_no_cudagraphs | 82%, 53/65 | 82%, 42/51 | 82%, 61/74 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.03x | 1.11x |
| inductor | 1.50x | 1.29x | 1.25x |
| inductor_no_cudagraphs | 1.24x | 1.22x | 1.23x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.16 | 2.43 | 1.91 |
| aot_eager | 5.77 | 7.84 | 7.05 |
| aot_cudagraphs | 8.60 | 16.10 | 13.16 |
| nvprims_nvfuser | 73.63 | 109.11 | 124.35 |
| inductor | 29.31 | 29.54 | 34.71 |
| inductor_no_cudagraphs | 28.61 | 25.45 | 33.28 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.85x | 0.87x | 0.84x |
| inductor | 0.87x | 0.72x | 0.98x |
| inductor_no_cudagraphs | 1.01x | 0.96x | 1.09x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 54/56 | 100%, 43/43 | 100%, 61/61 |
| aot_eager | 96%, 54/56 | 100%, 43/43 | 97%, 59/61 |
| aot_cudagraphs | 82%, 46/56 | 77%, 33/43 | 44%, 27/61 |
| nvprims_nvfuser | 80%, 45/56 | 60%, 26/43 | 67%, 41/61 |
| inductor | 84%, 47/56 | 79%, 34/43 | 95%, 58/61 |
| inductor_no_cudagraphs | 91%, 51/56 | 93%, 40/43 | 95%, 58/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.02x | 1.00x | 1.00x |
| aot_cudagraphs | 1.11x | 1.04x | 1.00x |
| nvprims_nvfuser | 1.04x | 1.03x | 1.13x |
| inductor | 1.45x | 1.29x | 1.21x |
| inductor_no_cudagraphs | 1.21x | 1.18x | 1.20x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.18 | 2.43 | 1.90 |
| aot_eager | 5.79 | 7.70 | 7.05 |
| aot_cudagraphs | 8.75 | 15.76 | 13.49 |
| nvprims_nvfuser | 68.19 | 105.96 | 149.34 |
| inductor | 42.33 | 33.17 | 46.14 |
| inductor_no_cudagraphs | 40.81 | 26.47 | 44.67 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 1.00x | 0.99x |
| aot_eager | 0.87x | 0.91x | 0.87x |
| aot_cudagraphs | 0.39x | 0.36x | 0.31x |
| nvprims_nvfuser | 0.91x | 1.00x | 0.94x |
| inductor | 0.83x | 0.66x | 0.97x |
| inductor_no_cudagraphs | 0.96x | 0.88x | 1.08x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 98%, 53/54 | 100%, 42/42 | 100%, 61/61 |
| aot_eager | 98%, 53/54 | 100%, 42/42 | 95%, 58/61 |
| aot_cudagraphs | 89%, 48/54 | 86%, 36/42 | 90%, 55/61 |
| nvprims_nvfuser | 61%, 33/54 | 12%, 5/42 | 54%, 33/61 |
| inductor | 83%, 45/54 | 93%, 39/42 | 92%, 56/61 |
| inductor_no_cudagraphs | 87%, 47/54 | 93%, 39/42 | 92%, 56/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.12x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.03x | 1.09x |
| inductor | 1.80x | 1.73x | 1.40x |
| inductor_no_cudagraphs | 1.37x | 1.51x | 1.35x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.37 | 2.91 | 2.14 |
| aot_eager | 6.99 | 10.37 | 8.51 |
| aot_cudagraphs | 11.25 | 17.92 | 16.08 |
| nvprims_nvfuser | 67.63 | 131.40 | 148.18 |
| inductor | 34.25 | 38.38 | 43.61 |
| inductor_no_cudagraphs | 34.42 | 33.66 | 41.60 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.85x | 0.90x | 0.87x |
| aot_cudagraphs | 0.41x | 0.39x | 0.33x |
| nvprims_nvfuser | 0.85x | 1.04x | 0.87x |
| inductor | 0.83x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.96x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 93%, 57/61 |
| aot_cudagraphs | 85%, 46/54 | 81%, 34/42 | 89%, 54/61 |
| nvprims_nvfuser | 59%, 32/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 81%, 44/54 | 90%, 38/42 | 90%, 55/61 |
| inductor_no_cudagraphs | 85%, 46/54 | 90%, 38/42 | 90%, 55/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.12x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.04x | 1.08x |
| inductor | 1.84x | 1.74x | 1.41x |
| inductor_no_cudagraphs | 1.38x | 1.53x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.06 | 2.84 | 2.33 |
| aot_eager | 6.61 | 10.24 | 8.69 |
| aot_cudagraphs | 9.51 | 16.50 | 16.36 |
| nvprims_nvfuser | 66.11 | 133.86 | 151.35 |
| inductor | 33.97 | 38.49 | 44.16 |
| inductor_no_cudagraphs | 34.21 | 33.58 | 41.73 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.84x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.83x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.96x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 93%, 57/61 |
| aot_cudagraphs | 85%, 46/54 | 81%, 34/42 | 89%, 54/61 |
| nvprims_nvfuser | 59%, 32/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 81%, 44/54 | 90%, 38/42 | 90%, 55/61 |
| inductor_no_cudagraphs | 85%, 46/54 | 90%, 38/42 | 90%, 55/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.12x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.04x | 1.08x |
| inductor | 1.84x | 1.74x | 1.41x |
| inductor_no_cudagraphs | 1.38x | 1.53x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.06 | 2.84 | 2.33 |
| aot_eager | 6.61 | 10.24 | 8.69 |
| aot_cudagraphs | 9.51 | 16.50 | 16.36 |
| nvprims_nvfuser | 66.11 | 133.86 | 151.35 |
| inductor | 33.97 | 38.49 | 44.16 |
| inductor_no_cudagraphs | 34.21 | 33.58 | 41.73 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.84x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.83x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.96x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 93%, 57/61 |
| aot_cudagraphs | 85%, 46/54 | 81%, 34/42 | 89%, 54/61 |
| nvprims_nvfuser | 59%, 32/54 | 10%, 4/42 | 52%, 32/61 |
| inductor | 81%, 44/54 | 90%, 38/42 | 90%, 55/61 |
| inductor_no_cudagraphs | 85%, 46/54 | 90%, 38/42 | 90%, 55/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| aot_cudagraphs | 1.22x | 1.12x | 1.00x |
| nvprims_nvfuser | 1.02x | 1.04x | 1.08x |
| inductor | 1.84x | 1.74x | 1.41x |
| inductor_no_cudagraphs | 1.38x | 1.53x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.06 | 2.84 | 2.33 |
| aot_eager | 6.61 | 10.24 | 8.69 |
| aot_cudagraphs | 9.51 | 16.50 | 16.36 |
| nvprims_nvfuser | 66.11 | 133.86 | 151.35 |
| inductor | 33.97 | 38.49 | 44.16 |
| inductor_no_cudagraphs | 34.21 | 33.58 | 41.73 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.84x | 0.89x | 0.87x |
| aot_cudagraphs | 0.41x | 0.38x | 0.33x |
| nvprims_nvfuser | 0.83x | 1.01x | 0.86x |
| inductor | 0.83x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.96x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.
Passrate
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 96%, 52/54 | 98%, 41/42 | 98%, 60/61 |
| aot_eager | 94%, 51/54 | 95%, 40/42 | 93%, 57/61 |
| inductor | 81%, 44/54 | 90%, 38/42 | 90%, 55/61 |
| inductor_no_cudagraphs | 85%, 46/54 | 90%, 38/42 | 90%, 55/61 |
+------------------------+------------+-------------+-------------+
Geometric mean speedup
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 1.00x | 1.01x | 1.00x |
| aot_eager | 1.01x | 1.00x | 1.00x |
| inductor | 1.84x | 1.74x | 1.41x |
| inductor_no_cudagraphs | 1.38x | 1.53x | 1.36x |
+------------------------+------------+-------------+-------------+
Mean compilation time (seconds)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 2.06 | 2.84 | 2.33 |
| aot_eager | 6.61 | 10.24 | 8.69 |
| inductor | 33.97 | 38.49 | 44.16 |
| inductor_no_cudagraphs | 34.21 | 33.58 | 41.73 |
+------------------------+------------+-------------+-------------+
Peak memory footprint compression ratio (higher is better)
+------------------------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| eager | 0.97x | 0.99x | 0.99x |
| aot_eager | 0.84x | 0.89x | 0.87x |
| inductor | 0.83x | 0.85x | 0.94x |
| inductor_no_cudagraphs | 0.96x | 1.01x | 1.05x |
+------------------------+------------+-------------+-------------+
Comment