Open ysiraichi opened 10 months ago
State after 7 weeks of work:
Weekly update (Dec 1~Dec 10):
Weekly update (Dec 11~Dec 15):
Can we please add a pass rate table in the weekly report that includes:
Inference
Training
Weekly update (Jan 8 ~ Jan 12):
Inference | Training | |
---|---|---|
Inductor | 91 | 64 |
Non-Dynamo | 87 | 67 |
Dynamo | 86 | 57 |
Weekly update (Jan 15 ~ Jan 19):
Inference | Training | |
---|---|---|
Inductor | 85 | 62 |
Non-Dynamo | 70 | 57 |
Dynamo | 71 | 55 |
Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?
cc @frgossen @golechwierowicz @cota
Weekly update (Jan 22 ~ Jan 26):
Inference | Training | |
---|---|---|
Inductor | 88 | 63 |
Non-Dynamo | 69 | 57 |
Dynamo | 72 | 55 |
--filter
argument)Weekly update (Jan 29 ~ Feb 2):
Inference | Training | |
---|---|---|
Inductor | 87 (last: 88) | 63 |
Non-Dynamo | 82 (last: 69) | 56 (last: 57) |
Dynamo | 82 (last: 72) | 53 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 86 | 60 |
Non-Dynamo | 81 | 53 |
Dynamo | 82 | 49 |
fp32
precision (while setting XLA_USE_FP16
):
fp32
precision (while setting XLA_USE_FP16
):
Weekly update (Feb 5 ~ Feb 9):
Inference | Training | |
---|---|---|
Inductor | 87 (last: 87) | 63 |
Non-Dynamo | 82 (last: 82) | 57 (last: 56) |
Dynamo | 84 (last: 82) | 53 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 86 | 60 |
Non-Dynamo | 81 | 53 |
Dynamo | 84 | 49 |
Weekly update (Feb 12 ~ Feb 16):
Could not run the benchmarks this time, due to a compilation issue: #6564
Weekly update (Feb 19 ~ Feb 23):
There was an error in the benchmarking scripts, making it so we were unable to run using XLA: https://github.com/pytorch/xla/pull/6612
Inference | Training | |
---|---|---|
Inductor | 81 (last: 87) | 65 (last: 63) |
Non-Dynamo | 72 (last: 82) | 61 (last: 57) |
Dynamo | 73 (last: 84) | 54 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 86) | 62 (last: 60) |
Non-Dynamo | 71 (last: 81) | 57 (last: 53) |
Dynamo | 73 (last: 84) | 52 (last: 49) |
Inductor: Inference (-10, +4)
Inductor: Training (-3, +5)
XLA:GPU (non-dynamo): Inference (-15, +5)
aten::upsample_bilinear2d
(after: #6518) (issue: #6520)
XLA:GPU (non-dynamo): Training (0, +4)
XLA:GPU (dynamo): Inference (-16, +5)
XLA:GPU (dynamo): Training (-4, +5)
Weekly update (Feb 26 ~ Mar 01):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 65 (last: 65) |
Non-Dynamo | 72 (last: 72) | 61 (last: 61) |
Dynamo | 73 (last: 73) | 56 (last: 54) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 63 (last: 62) |
Non-Dynamo | 72 (last: 71) | 58 (last: 57) |
Dynamo | 71 (last: 73) | 54 (last: 52) |
XLA:GPU (non-dynamo): Training (-1, +1)
XLA:GPU (dynamo): Inference (-2, 0)
XLA:GPU (dynamo): Training (0, +2)
Weekly update (Mar 04 ~ Mar 08):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 65) |
Non-Dynamo | 72 (last: 72) | 61 (last: 61) |
Dynamo | 71 (last: 71) | 57 (last: 56) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 64 (last: 63) |
Non-Dynamo | 72 (last: 72) | 58 (last: 58) |
Dynamo | 71 (last: 71) | 55 (last: 54) |
Inductor: Training (0, +1)
XLA:GPU (dynamo): Training (0, +1)
Tensor.new
dynamo support
Weekly update (Mar 11 ~ Mar 15):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 37 (last: 72) | 28 (last: 61) |
Dynamo | 31 (last: 71) | 18 (last: 57) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 64 (last: 63) |
Non-Dynamo | 45 (last: 72) | 38 (last: 58) |
Dynamo | 44 (last: 71) | 22 (last: 55) |
No summary this week because:
@ysiraichi The regression you saw might be due to https://github.com/pytorch/xla/pull/6677 (open xla pin update). Our team is looking into this issue.
Weekly update (Mar 18 ~ Mar 21):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 76 (last: 72) | 64 (last: 61) |
Dynamo | 73 (last: 71) | 58 (last: 57) |
Inference | Training | |
---|---|---|
Inductor | 80 (last: 81) | 64 (last: 64) |
Non-Dynamo | 76 (last: 72) | 61 (last: 58) |
Dynamo | 74 (last: 71) | 56 (last: 55) |
XLA:GPU (non-dynamo): Inference (0, +4)
as_strided_copy
new implementation
pow
data-type promotion fixed
Embedding
index type requirement
XLA:GPU (non-dynamo): Training (0, +3)
as_strided_copy
new implementation
XLA:GPU (dynamo): Inference (-2, +4)
as_strided_copy
new implementation
pow
data-type promotion fixed
Embedding
index type requirement
XLA:GPU (dynamo): Training (-2, +3)
as_strided_copy
new implementation
pow
data-type promotion fixed
Last week, the results were unchanged. We are preparing for performance optimizations. cc @ysiraichi
Weekly update (Apr 1 ~ Apr 5):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 75 (last: 76) | 63 (last: 64) |
Dynamo | 73 (last: 73) | 53 (last: 58) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 80) | 65 (last: 64) |
Non-Dynamo | 75 (last: 76) | 61 (last: 61) |
Dynamo | 74 (last: 74) | 51 (last: 56) |
Inductor: Inference (-1, +1)
XLA:GPU (non-dynamo): Inference (-1, 0)
XLA:GPU (non-dynamo): Training (-1, 0)
XLA:GPU (dynamo): Inference (-1, +1)
XLA:GPU (dynamo): Training (-7, +2)
Weekly update (Apr 8 ~ Apr 12):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 74 (last: 75) | 64 (last: 63) |
Dynamo | 74 (last: 73) | 53 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 75 (last: 75) | 61 (last: 61) |
Dynamo | 75 (last: 74) | 51 (last: 51) |
XLA:GPU (non-dynamo): Inference (-1, 0)
XLA:GPU (non-dynamo): Training (0, +1)
XLA:GPU (dynamo): Inference (0, +1)
Weekly update (Apr 15 ~ Apr 19):
Inference | Training | |
---|---|---|
Inductor | ? (last: 81) | ? (last: 66) |
Non-Dynamo | ? (last: 74) | ? (last: 64) |
Dynamo | ? (last: 74) | ? (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 76 (last: 75) | 61 (last: 61) |
Dynamo | 76 (last: 75) | 51 (last: 51) |
Weekly update (Apr 22 ~ Apr 26):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 75 (last: 74) | 64 (last: 64) |
Dynamo | 75 (last: 74) | 53 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 82) | 65 (last: 65) |
Non-Dynamo | 76 (last: 76) | 61 (last: 61) |
Dynamo | 76 (last: 76) | 51 (last: 51) |
XLA:GPU (non-dynamo): Inference (0, +1)
XLA:GPU (dynamo): Inference (0, +1)
Weekly update (Apr 29 ~ May 3):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 76 (last: 75) | 64 (last: 64) |
Dynamo | 75 (last: 75) | 53 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 81) | 65 (last: 65) |
Non-Dynamo | 76 (last: 76) | 61 (last: 61) |
Dynamo | 76 (last: 76) | 51 (last: 51) |
Weekly update (May 6 ~ May 10):
networkx
had removed support to Python 3.9 (see issue update)Inference | Training | |
---|---|---|
Inductor | 82 (last: 81) | 66 (last: 66) |
Non-Dynamo | 76 (last: 75) | 64 (last: 64) |
Dynamo | 75 (last: 75) | 53 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 76 (last: 76) | 61 (last: 61) |
Dynamo | 76 (last: 76) | 51 (last: 51) |
SyntaxError: unterminated string literal
Weekly update (May 13 ~ May 17):
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 66 (last: 66) |
Non-Dynamo | 77 (last: 76) | 61 (last: 64) |
Dynamo | 78 (last: 75) | 55 (last: 53) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 77 (last: 76) | 59 (last: 61) |
Dynamo | 78 (last: 76) | 52 (last: 51) |
All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16
.
XLA:GPU (non-dynamo): Inference (0, +1)
XLA:GPU (non-dynamo): Training (-5, +2)
XLA:GPU (dynamo): Inference (0, +3)
XLA:GPU (dynamo): Training (0, +2)
Weekly update (May 20 ~ May 24):
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 66 (last: 66) |
Non-Dynamo | 77 (last: 77) | 63 (last: 61) |
Dynamo | 78 (last: 78) | 55 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 77 (last: 77) | 61 (last: 59) |
Dynamo | 78 (last: 78) | 52 (last: 52) |
Weekly update (May 27 ~ May 29):
Weekly update (June 3 ~ June 6):
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 66) |
Non-Dynamo | 79 (last: 77) | 61 (last: 63) |
Dynamo | 79 (last: 78) | 55 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 64 (last: 65) |
Non-Dynamo | 79 (last: 77) | 60 (last: 61) |
Dynamo | 79 (last: 78) | 52 (last: 52) |
Inductor: Training (-1, +0)
XLA:GPU (non-dynamo): Inference (-0, +2)
XLA:GPU (non-dynamo): Training (-3, +1)
XLA:GPU (dynamo): Inference (-0, +1)
Weekly update (June 10 ~ June 14):
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 65 (last: 65) |
Non-Dynamo | 79 (last: 79) | 63 (last: 61) |
Dynamo | 79 (last: 79) | 55 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 82 (last: 82) | 64 (last: 64) |
Non-Dynamo | 79 (last: 79) | 61 (last: 60) |
Dynamo | 79 (last: 79) | 52 (last: 52) |
Weekly update (June 17 ~ June 21):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 82) | 65 (last: 65) |
Non-Dynamo | 78 (last: 79) | 63 (last: 63) |
Dynamo | 78 (last: 79) | 55 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 82) | 64 (last: 64) |
Non-Dynamo | 78 (last: 79) | 61 (last: 61) |
Dynamo | 78 (last: 79) | 52 (last: 52) |
XLA:GPU (non-dynamo): Inference (-1, +0)
XLA:GPU (dynamo): Inference (-1, +0)
Weekly update (June 24 ~ June 28):
Inference | Training | |
---|---|---|
Inductor | 74 (last: 81) | 60 (last: 65) |
Non-Dynamo | 73 (last: 78) | 60 (last: 63) |
Dynamo | 72 (last: 78) | 54 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 74 (last: 81) | 59 (last: 64) |
Non-Dynamo | 73 (last: 78) | 58 (last: 61) |
Dynamo | 72 (last: 78) | 51 (last: 52) |
Inductor: Inference (-7, +0)
Inductor: Training (-5, +0)
XLA:GPU (non-dynamo): Inference (-6, +1)
XLA:GPU (non-dynamo): Training -4, +1)
XLA:GPU (dynamo): Inference (-6, +0)
XLA:GPU (dynamo): Training -1, +0)
Weekly update (July 1 ~ July 5):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 74) | 66 (last: 60) |
Non-Dynamo | 78 (last: 73) | 64 (last: 60) |
Dynamo | 78 (last: 72) | 55 (last: 54) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 74) | 65 (last: 59) |
Non-Dynamo | 78 (last: 73) | 62 (last: 58) |
Dynamo | 78 (last: 72) | 52 (last: 51) |
Inductor: Inference (-0, +7)
Inductor: Training (-0, +6)
XLA:GPU (non-dynamo): Inference (-1, +6)
XLA:GPU (non-dynamo): Training (-1, +5)
XLA:GPU (dynamo): Inference (-0, +6)
XLA:GPU (dynamo): Training -1, +0)
Weekly update (July 8 ~ July 12):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 75 (last: 78) | 61 (last: 64) |
Dynamo | 75 (last: 78) | 52 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 65 (last: 65) |
Non-Dynamo | 75 (last: 78) | 59 (last: 62) |
Dynamo | 75 (last: 78) | 49 (last: 52) |
XLA:GPU (non-dynamo): Inference (-3, +0)
XLA:GPU (non-dynamo): Training (-3, +0)
XLA:GPU (dynamo): Inference (-3, +0)
XLA:GPU (dynamo): Training (-3, +0)
Weekly update (July 15 ~ July 19):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 78 (last: 75) | 64 (last: 61) |
Dynamo | 78 (last: 75) | 55 (last: 52) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 65 (last: 65) |
Non-Dynamo | 78 (last: 75) | 62 (last: 59) |
Dynamo | 78 (last: 75) | 52 (last: 49) |
XLA:GPU (non-dynamo): Inference (-0, +3)
XLA:GPU (non-dynamo): Training (-0, +3)
XLA:GPU (dynamo): Inference (-0, +3)
XLA:GPU (dynamo): Training (-0, +3)
Weekly update (July 22 ~ July 26):
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 66 (last: 66) |
Non-Dynamo | 77 (last: 78) | 64 (last: 64) |
Dynamo | 78 (last: 78) | 55 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 81 (last: 81) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 62 (last: 62) |
Dynamo | 78 (last: 78) | 52 (last: 52) |
Weekly update (July 29 ~ Aug 9):
Inference | Training | |
---|---|---|
Inductor | 77 (last: 81) | 66 (last: 66) |
Non-Dynamo | 78 (last: 77) | 63 (last: 64) |
Dynamo | 77 (last: 78) | 52 (last: 55) |
Inference | Training | |
---|---|---|
Inductor | 77 (last: 81) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 62 (last: 62) |
Dynamo | 77 (last: 78) | 45 (last: 52) |
Inductor: Inference (-4, +0)
XLA:GPU (dynamo): Inference (-1, +0)
XLA:GPU (dynamo): Training (-4, +0)
Weekly update (Aug 12 ~ Aug 16):
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 63 (last: 63) |
Dynamo | 77 (last: 77) | 52 (last: 52) |
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 62 (last: 62) |
Dynamo | 77 (last: 77) | 44 (last: 45) |
Weekly update (Aug 19 ~ Aug 23):
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 63 (last: 63) |
Dynamo | 77 (last: 77) | 49 (last: 52) |
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 62 (last: 62) |
Dynamo | 77 (last: 77) | 41 (last: 44) |
Weekly update (Aug 26 ~ Aug 30):
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 64 (last: 63) |
Dynamo | 77 (last: 77) | 51 (last: 49) |
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 63 (last: 62) |
Dynamo | 77 (last: 77) | 48 (last: 41) |
XLA:GPU (non-dynamo): Training (-0, +1)
XLA:GPU (dynamo): Training (-0, +2)
Weekly update (Sep 2 ~ Sep 6):
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 64 (last: 64) |
Dynamo | 77 (last: 77) | 52 (last: 51) |
Inference | Training | |
---|---|---|
Inductor | 77 (last: 77) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 63 (last: 63) |
Dynamo | 77 (last: 77) | 49 (last: 48) |
Weekly update (Sep 9 ~ Sep 13):
Inference | Training | |
---|---|---|
Inductor | 79 (last: 77) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 64 (last: 64) |
Dynamo | 77 (last: 77) | 52 (last: 52) |
Inference | Training | |
---|---|---|
Inductor | 79 (last: 77) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 63 (last: 63) |
Dynamo | 77 (last: 77) | 49 (last: 49) |
Weekly update (Sep 16 ~ Sep 20):
Inference | Training | |
---|---|---|
Inductor | 79 (last: 79) | 66 (last: 66) |
Non-Dynamo | 78 (last: 78) | 64 (last: 64) |
Dynamo | 77 (last: 77) | 52 (last: 52) |
Inference | Training | |
---|---|---|
Inductor | 79 (last: 79) | 65 (last: 65) |
Non-Dynamo | 78 (last: 78) | 63 (last: 63) |
Dynamo | 77 (last: 77) | 49 (last: 49) |
Summary of Contributions (9th Feb)
1) Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.
2) Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.
Current State
This post has two lists:
Each of them shows the failing models:
openxla
)These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:
Environment
Inference
Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)
Dynamo+
openxla
. 78/81 - 96% (against inductor)Models also Failing on Inductor
Inference Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
Inference Failing on Inductor CUDA with Different Errors
Training
Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)
Dynamo+
openxla
. Pass rate: 55/66 - 83% (against inductor)Models also Failing on Inductor
No Training Support on Inductor CUDA
Benchmarks that raise the error:
Model's DEFAULT_TRAIN_BSIZE is not implemented
.Training Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
Training Failing on Inductor CUDA with Different Errors
cc @JackCaoG @miladm