Failing Torchbench Models: tracking issue

ysiraichi commented 10 months ago

Summary of Contributions (9th Feb)

1) Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

|            | Inference | Training |
|------------|-----------|----------|
| Inductor    | 87 | 63 |
| Dynamo     | 60 to 82  | 41 to 53 |
| Non-Dynamo | 79 to 82  | 54 to 56 |

2) Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

Current State

This post has two lists:

Failing inference models
Failing training models

Each of them shows the failing models:

Tracing without Dynamo (Eager-mode)
Tracing with Dynamo into openxla (Dynamo+openxla)

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

GPU: A100 40GB

Inference

Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)

~[x] DALLE2_pytorch~
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Moved to canary models: https://github.com/pytorch/benchmark/pull/2311
[ ] cm3leon_generate
- Issue: #6004
[x] hf_Longformer
- Issue: #5835
  - PyTorch/XLA PR: #6624
[ ] hf_T5_generate
- Issue: #6004
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Issue: #7636
- Issue: #7647
[x] nvidia_deeprecommender
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
~[ ] simple_gpt~
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
~[ ] simple_gpt_tp_manual~
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (inside skip list -- torchbench.yaml)
~[ ] tacotron2~
- Issue: #6112
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (inside skip list -- torchbench.yaml)
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] vision_maskrcnn
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
- Issue: #6557
  - PyTorch/XLA PR: #7113

Dynamo+`openxla`. 78/81 - 96% (against inductor)

~[x] DALLE2_pytorch~
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Moved to canary models: https://github.com/pytorch/benchmark/pull/2311
[x] Super_SloMo
- PyTorch/XLA PR: #5707
- PyTorch/benchmark PR: https://github.com/pytorch/benchmark/pull/2038
[ ] cm3leon_generate
- Issue: #5967
[x] detectron2_fasterrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fcos_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] dlrm
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] hf_BigBird
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] hf_GPT2
- PyTorch/XLA PR: #5922
[x] hf_GPT2_large
- PyTorch/XLA PR: #5922
[x] hf_Longformer
- Issue: #5835
  - PyTorch/XLA PR: #6624
[x] hf_Reformer
- Issue: #5837
  - PyTorch PR: https://github.com/pytorch/pytorch/pull/121007
[ ] hf_T5_generate
- Issue: #5967
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Issue: #7636
- Issue: #7647
[x] nvidia_deeprecommender
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pyhpc_isoneutral_mixing
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] pyhpc_turbulent_kinetic_energy
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] speech_transformer
- PyTorch/XLA PR: #5823
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

[ ] hf_clip
- 'str' object has no attribute 'shape'
[ ] mobilenet_v2_quantized_qat
[ ] resnet50_quantized_qat

Inference Failing on Inductor CUDA with Different Errors

[ ] simple_gpt
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
[ ] simple_gpt_tp_manual
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (inside skip list -- torchbench.yaml)
[ ] tacotron2
- Issue: #6005
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (inside skip list -- torchbench.yaml)

Training

Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)

~[ ] DALLE2_pytorch~
- Issue: #6084
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Moved to canary models: https://github.com/pytorch/benchmark/pull/2311
[x] demucs
- Issue: #6003
[x] densenet121
- Issue: #6003
[x] detectron2_fasterrcnn_r_101_c4
- Issue: #6004
[x] detectron2_fasterrcnn_r_101_dc5
- Issue: #6004
[x] detectron2_fasterrcnn_r_101_fpn
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_c4
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_dc5
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_fpn
- Issue: #6004
[ ] detectron2_fcos_r_50_fpn
- Skipped by the benchmarking script
[x] detectron2_maskrcnn_r_101_c4
- Issue: #6004
[x] detectron2_maskrcnn_r_101_fpn
- Issue: #6004
[x] detectron2_maskrcnn_r_50_c4
- Issue: #6004
[x] detectron2_maskrcnn_r_50_fpn
- Issue: #6004
[x] dlrm
- Issue: #6008
  - PyTorch/XLA PR: #7584
[x] hf_GPT2_large
- Issue: #6003
[x] hf_Longformer
- Issue: #5835
  - PyTorch/XLA PR: #6624
[x] hf_T5_base
- Issue: #6003
~[ ] llama_v2_7b_16h~
- Issue: #6003
- SKIP (training not supported -- torchbench.yaml)
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Issue: #7636
- Issue: #7647
[x] nvidia_deeprecommender
- RTX 2060 OOM
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] stable_diffusion_unet
- Issue: #6003
~[ ] tacotron2~
- Issue: #6112
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- SKIP (inside skip list -- torchbench.yaml)
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] timm_nfnet
- Issue: #6003
[x] timm_vision_transformer_large
- Issue: #6003
[x] yolov3
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Dynamo+`openxla`. Pass rate: 55/66 - 83% (against inductor)

[ ] demucs
- Issue: #6003
[x] densenet121
- Issue: #6003
[x] dlrm
- Issue: #6008
  - PyTorch/XLA PR: #7584
[x] hf_BigBird
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] hf_GPT2
- PyTorch/XLA PR: #5922
[x] hf_GPT2_large
- PyTorch/XLA PR: #5922
[x] hf_Longformer
- Issue: #5835
  - PyTorch/XLA PR: #6624
[x] hf_Reformer
- Issue: #6009
  - PyTorch PR: https://github.com/pytorch/pytorch/pull/121007
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Issue: #7636
- Issue: #7647
[x] nvidia_deeprecommender
- Issue: #6084
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[ ] stable_diffusion_unet
- Issue: #6003
[x] timm_efficientdet
- Issue: #6003
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] timm_vision_transformer
- Issue: #6003
[ ] timm_vision_transformer_large
- Issue: #6003
[x] torch_multimodal_clip
- Issue: #6005
[x] yolov3
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

[ ] cm3leon_generate
[ ] detectron2_fcos_r_50_fpn
[ ] doctr_det_predictor
[ ] doctr_reco_predictor
[ ] hf_T5_generate
[ ] llama
[ ] phi_1_5
[ ] pyhpc_equation_of_state
[ ] pyhpc_isoneutral_mixing
[ ] pyhpc_turbulent_kinetic_energy
[ ] sam
[ ] simple_gpt
[ ] simple_gpt_tp_manual

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

[ ] DALLE2_pytorch
- Issue: #6084
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
- Moved to canary models: https://github.com/pytorch/benchmark/pull/2311
[ ] llama_v2_7b_16h
- Issue: #6003
- SKIP (training not supported -- torchbench.yaml)
[ ] maml
- Issue: #6084
- SKIP (training not supported -- torchbench.yaml)
[ ] vision_maskrcnn
- targets should not be none when in training mode
- Fix https://github.com/pytorch/pytorch/pull/114774

Training Failing on Inductor CUDA with Different Errors

[x] detectron2_fasterrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] opacus_cifar10
- Issue: #5967

cc @JackCaoG @miladm

lezcano commented 10 months ago

State after 7 weeks of work:

Models fixed so far:

pyhpc_isoneutral_mixing
pyhpc_turbulent_kinetic_energy
dlrm
Super_SloMo
speech_transformer
PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
https://github.com/pytorch/xla/pull/5688
https://github.com/pytorch/xla/pull/5689
https://github.com/pytorch/xla/pull/5707
https://github.com/pytorch/xla/pull/5743
https://github.com/pytorch/xla/pull/5769
https://github.com/pytorch/xla/pull/5823
https://github.com/pytorch/xla/pull/5914
https://github.com/pytorch/pytorch/pull/112202
https://github.com/pytorch/pytorch/pull/114626
https://github.com/pytorch/pytorch/pull/114626
https://github.com/pytorch/benchmark/pull/2038

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 9 months ago

Weekly update (Dec 1~Dec 10):

Models fixed:

DALLE2_pytorch
- training is now failing with the same error as inductor
stable_diffusion_unet
- training is still failing with OOM
stable_diffusion_text_encoder
hf_GPT2
hf_GPT2_large
- training without dynamo is still failing
yolov3
- Failing possibly due to a cuNND error, which is likely an OOM, on a RTX 2060. Haven't tested it yet on a A100, though

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 9 months ago

Weekly update (Dec 11~Dec 15):

Models fixed:

pytorch_CycleGAN_and_pix2pix
nvidia_deeprecommender
- dynamo+openxla training is still failling
simple_gpt and simple_gpt_tp_manual
- failing due to the same reasons as inductor
moco
- failing due to distributed backend
timm_efficientdet
- dynamo+openxla training is still failing

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented 8 months ago

Can we please add a pass rate table in the weekly report that includes:

Inference

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Training

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

ysiraichi commented 8 months ago

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	91	64
Non-Dynamo	87	67
Dynamo	86	57

Models fixed:

detectron2 models (inference with dynamo)
hf_BigBird (inference and training with dynamo)
torch_multimodal_clip (training with dynamo)
timm_vision_transformer (training with dynamo)
Likely not due to the merged PRs below:
- detectron2 models: all but detectron2_fcos_r_50_fpn (training without dynamo)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/pytorch/pull/115924

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6292

ysiraichi commented 8 months ago

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	85	62
Non-Dynamo	70	57
Dynamo	71	55

Models that started failing:

After #6296:
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_fcos_r_50_fpn
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- mobilenet_v3_large
- timm_regnet
- hf_Bart
Started being skipped:
- pytorch_CycleGAN_and_pix2pix
- pytorch_unet
Unsupported precision:
- pytorch_unet
- yolov3
cuDNN error:
- Super_SloMo (inductor)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6336

miladm commented 8 months ago

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

ysiraichi commented 8 months ago

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	88	63
Non-Dynamo	69	57
Dynamo	72	55

Models fixed:

(inductor) moco
(inductor) Super_SloMo
- Failed when executed with all other benchmarks
- Passed when executed alone (by specifying --filter argument)
(inference) llama_v2_7b_16h

Models that started failing:

(inference + non-dynamo) timm_efficientnet (to be fixed by: #6389)
(inference + non-dynamo) timm_nfnet (to be fixed by: #6389)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 8 months ago

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 88)	63
Non-Dynamo	82 (last: 69)	56 (last: 57)
Dynamo	82 (last: 72)	53 (last: 55)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	82	49

Models Summary (for A100)

Inductor: Inference (-4, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) AMP supported:
  - pytorch_unet
  - yolov3
Inductor: Training (-3, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (fail) Failing due to sparse error:
  - dlrm
- (pass) AMP supported:
  - pytorch_unet
- (pass) No OOM:
  - demucs
  - opacus_cifar10
XLA:GPU (non-dynamo): Inference (-3, +16)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - mobilenet_v3_large
  - timm_efficientnet
  - timm_nfnet
  - timm_regnet
- (pass) AMP supported:
  - yolov3
XLA:GPU (non-dynamo): Training (-2, +1)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (pass) No OOM:
  - hf_GPT2_large
XLA:GPU (dynamo): Inference (-4, +14)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - hf_Bart
- (pass) AMP supported:
  - yolov3
XLA:GPU (dynamo): Training (-2, +0)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/pytorch/pull/118783

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 87)	63
Non-Dynamo	82 (last: 82)	57 (last: 56)
Dynamo	84 (last: 82)	53 (last: 53)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	84	49

Models Summary

XLA:GPU (non-dynamo): Training (0, +1)
- (pass) No OOM:
  - densenet121
XLA:GPU (dynamo): Inference (0, +2)
- (pass) Increased compilation cache:
  - cm3leon_generate
  - hf_T5_generate

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6518

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 12 ~ Feb 16):

Pass rate (out of 99 benchmarks):

Could not run the benchmarks this time, due to a compilation issue: #6564

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 19 ~ Feb 23):

Pass rate (out of 99 benchmarks):

There was an error in the benchmarking scripts, making it so we were unable to run using XLA: https://github.com/pytorch/xla/pull/6612

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	81 (last: 87)	65 (last: 63)
Non-Dynamo	72 (last: 82)	61 (last: 57)
Dynamo	73 (last: 84)	54 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 86)	62 (last: 60)
Non-Dynamo	71 (last: 81)	57 (last: 53)
Dynamo	73 (last: 84)	52 (last: 49)

Models Summary

Inductor: Inference (-10, +4)
- (fail) "roi_align_forward_kernel" not implemented for 'BFloat16' (after: #6518)
  - detectron2 benchmarks (10)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
Inductor: Training (-3, +5)
- (fail) Running on AMP (after: #6518)
  - mobilenet_v2_quantized_qat
  - resnet50_quantized_qat
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
XLA:GPU (non-dynamo): Inference (-15, +5)
- (fail) Error while lowering: aten::upsample_bilinear2d (after: #6518) (issue: #6520)
  - Background_Matting
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (non-dynamo): Training (0, +4)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (dynamo): Inference (-16, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (fail) Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1. (issue: #6557)
  - vision_maskrcnn
XLA:GPU (dynamo): Training (-4, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) Seen floating point types of different precisions in HLO (after: #6518)
  - hf_GPT2 and hf_GPT2_large (issue: #6521)
  - timm_nfnet (issue: #6649)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
- (pass) No OOM
  - stable_diffusion_unet

ysiraichi commented 7 months ago

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

PyTorch commit: d9db9e62e3d2d58d4e76a43f30c15db389e51c17
PyTorch/XLA commit: 5a113aff98ce42420891c724843ccb30691dc24a
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	73 (last: 73)	56 (last: 54)

L4

	Inference	Training
Inductor	81 (last: 81)	63 (last: 62)
Non-Dynamo	72 (last: 71)	58 (last: 57)
Dynamo	71 (last: 73)	54 (last: 52)

Models Summary

XLA:GPU (non-dynamo): Training (-1, +1)
- (fail) Timeout:
  - timm_efficientdet
- (pass) Smaller batch size
  - demucs
XLA:GPU (dynamo): Inference (-2, 0)
- (fail) Timeout:
  - cm3leon_generate
  - hf_T5_generate
XLA:GPU (dynamo): Training (0, +2)
- (pass) Smaller batch size
  - densenet121
  - timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 6 months ago

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

PyTorch commit: c253d1c1db06beb128f6bb4db861cd08a3c23c6b
PyTorch/XLA commit: 57f4780d2d5efd04e85e4a2c288eefdb596d2200
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	71 (last: 71)	57 (last: 56)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	72 (last: 72)	58 (last: 58)
Dynamo	71 (last: 71)	55 (last: 54)

Models Summary (A100)

Inductor: Training (0, +1)
- (pass) Reason unknown
  - dlrm
XLA:GPU (dynamo): Training (0, +1)
- (pass) Tensor.new dynamo support
  - hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 6 months ago

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2aa9c680e42c49cb9a0c3a2c0a562082f8
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	37 (last: 72)	28 (last: 61)
Dynamo	31 (last: 71)	18 (last: 57)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	45 (last: 72)	38 (last: 58)
Dynamo	44 (last: 71)	22 (last: 55)

Models Summary (A100)

No summary this week because:

Diff is too big
It might be due to a pin update

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

vanbasten23 commented 6 months ago

@ysiraichi The regression you saw might be due to https://github.com/pytorch/xla/pull/6677 (open xla pin update). Our team is looking into this issue.

ysiraichi commented 6 months ago

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2aa9c680e42c49cb9a0c3a2c0a562082f8
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 72)	64 (last: 61)
Dynamo	73 (last: 71)	58 (last: 57)

L4

	Inference	Training
Inductor	80 (last: 81)	64 (last: 64)
Non-Dynamo	76 (last: 72)	61 (last: 58)
Dynamo	74 (last: 71)	56 (last: 55)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
XLA:GPU (non-dynamo): Training (0, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) Unknown reason:
  - hf_T5_base
  - timm_efficientdet
XLA:GPU (dynamo): Inference (-2, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
- (fail) Unknown reason:
  - doctr_reco_predictor https://github.com/pytorch/xla/issues/6832
  - speech_transformer https://github.com/pytorch/xla/issues/6831
XLA:GPU (dynamo): Training (-2, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (fail) Unknown reason:
  - hf_Reformer https://github.com/pytorch/xla/issues/6830
  - speech_transformer https://github.com/pytorch/xla/issues/6831

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented 6 months ago

Last week, the results were unchanged. We are preparing for performance optimizations. cc @ysiraichi

ysiraichi commented 6 months ago

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

PyTorch commit: 72662bf05b3499ce96aae9183a489c78f0c44c84
PyTorch/XLA commit: 5c48be19e6ded305bb524b3d1231fd4ce4d46208
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 76)	63 (last: 64)
Dynamo	73 (last: 73)	53 (last: 58)

L4

	Inference	Training
Inductor	82 (last: 80)	65 (last: 64)
Non-Dynamo	75 (last: 76)	61 (last: 61)
Dynamo	74 (last: 74)	51 (last: 56)

Models Summary (A100)

Inductor: Inference (-1, +1)
- (pass) dlrm
- (fail) maml
XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6889
XLA:GPU (non-dynamo): Training (-1, 0)
- (fail) timm_efficientdet: OOM
XLA:GPU (dynamo): Inference (-1, +1)
- (pass) speech_transformer
- (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6899
XLA:GPU (dynamo): Training (-7, +2)
- (pass) hf_Reformer and speech_transformer
- (fail) hf_GPT2 and hf_GPT2_large https://github.com/pytorch/xla/issues/6900
- (fail) hf_T5, hf_T5_base, stable_diffusion_unet, and timm_vision_transformer_large: OOM
- (fail) hf_T5_large https://github.com/pytorch/xla/issues/6901

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 58a412cb271a3f98ae2e01fd1d24bdbb66645d4e
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	74 (last: 75)	64 (last: 63)
Dynamo	74 (last: 73)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	75 (last: 75)	61 (last: 61)
Dynamo	75 (last: 74)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) doctr_reco_predictor: TIMEOUT
XLA:GPU (non-dynamo): Training (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: b06c9c7700e13b7731a2b2f3b9ddbbfef2d0793c
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	? (last: 81)	? (last: 66)
Non-Dynamo	? (last: 74)	? (last: 64)
Dynamo	? (last: 74)	? (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 75)	61 (last: 61)
Dynamo	76 (last: 75)	51 (last: 51)

Models Summary (A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6933

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 2a204e9b473831776def499c8106bafe2c418d24
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 74)	64 (last: 64)
Dynamo	75 (last: 74)	53 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6958

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6988

ysiraichi commented 5 months ago

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

PyTorch commit: 489b4586e95752dc65a1821a4383b9679ccd5b6b
PyTorch/XLA commit: d1235858628417ed7abc0d61e6e9be50df3e1a87
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 81)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) doctr_reco_predictor

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6958

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1 (before: 11.8)
Python version: 3.10 (before: 3.8)
- Reason: networkx had removed support to Python 3.9 (see issue update)
PyTorch commit: 946b96fd54fdaa05d2f5b1e49d837124fbace983
PyTorch/XLA commit: 40f7e1f54b506475d40b40c0f49193411de6d68f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Notes

Inductor on L4 started failing with: SyntaxError: unterminated string literal
- Oddly enough, A100 didn't have the same error
- Didn't update the results of L4

Models Summary (A100)

Inductor: Inference (0, +1)
- (pass) maml

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/pytorch/pull/125876

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: 62c3ba652ea09e2076a27f200ad755541f37daeb
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 76)	61 (last: 64)
Dynamo	78 (last: 75)	55 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 76)	59 (last: 61)
Dynamo	78 (last: 76)	52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) detectron2_fcos_r_50_fpn
XLA:GPU (non-dynamo): Training (-5, +2)
- (fail) Super_SloMo
- (fail) mobilenet_v2_quantized_qat
- (fail) resnet50_quantized_qat
- (fail) timm_efficientdet
- (fail) timm_nfnet
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
XLA:GPU (dynamo): Inference (0, +3)
- (pass) Super_SloMo
- (pass) detectron2_fcos_r_50_fpn
- (pass) doctr_reco_predictor
XLA:GPU (dynamo): Training (0, +2)
- (pass) Super_SloMo
- (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7067

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: cb8533be03c228a84db26ab7d44fdf0a2311462f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 77)	63 (last: 61)
Dynamo	78 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 77)	61 (last: 59)
Dynamo	78 (last: 78)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-5, +2)
- (pass) Super_SloMo #7067
- (pass) timm_efficientdet #7091

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7095

ysiraichi commented 4 months ago

Weekly update (May 27 ~ May 29):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: f5328542b5365741176e71dd8a2954e0f350b9bc
PyTorch/XLA commit: aec273056a95d8119279c15d36c0f48f739fb810
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 66)
Non-Dynamo	79 (last: 77)	61 (last: 63)
Dynamo	79 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 65)
Non-Dynamo	79 (last: 77)	60 (last: 61)
Dynamo	79 (last: 78)	52 (last: 52)

Models Summary (A100)

Inductor: Training (-1, +0)
- (fail) dlrm
XLA:GPU (non-dynamo): Inference (-0, +2)
- (pass) Background_Matting #7168
- (pass) vision_maskrcnn #7113 #7168
XLA:GPU (non-dynamo): Training (-3, +1)
- (pass) timm_nfnet #7130
- (fail) drq #7247
- (fail) stable_diffusion_unet: OOM
- (fail) timm_vision_transformer_large: OOM
XLA:GPU (dynamo): Inference (-0, +1)
- (pass) vision_maskrcnn #7113 #7116

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 0344f95c2ea944cc916290097133470f963a5532
PyTorch/XLA commit: 286b31f0c0c752306e4a80a566b1ec9e82653991
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	79 (last: 79)	63 (last: 61)
Dynamo	79 (last: 79)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 64)
Non-Dynamo	79 (last: 79)	61 (last: 60)
Dynamo	79 (last: 79)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-1, +3)
- (pass) drq
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
- (fail) timm_nfnet #7271

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 17 ~ June 21):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 54b0006cb232f798281397b2261101625444c79b
PyTorch/XLA commit: cb6549aa35429b0b1c9f27f9959b2527d529b7e4
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 82)	65 (last: 65)
Non-Dynamo	78 (last: 79)	63 (last: 63)
Dynamo	78 (last: 79)	55 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 82)	64 (last: 64)
Non-Dynamo	78 (last: 79)	61 (last: 61)
Dynamo	78 (last: 79)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, +0)
- (fail) DALLE2_pytorch pytorch/benchmark#2311
XLA:GPU (dynamo): Inference (-1, +0)
- (fail) DALLE2_pytorch pytorch/benchmark#2311

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7318

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7304

ysiraichi commented 3 months ago

Weekly update (June 24 ~ June 28):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 5ee893a84acb979c71bb427f53a25b2aa1e7b7ca
PyTorch/XLA commit: 7d41035b89fe6b6f6cfb13679ffc256429efa7b2
PyTorch/benchmark commit: 9e0971c0f34478e6ac10ebd2fa056b5681d3e454

A100

	Inference	Training
Inductor	74 (last: 81)	60 (last: 65)
Non-Dynamo	73 (last: 78)	60 (last: 63)
Dynamo	72 (last: 78)	54 (last: 55)

L4

	Inference	Training
Inductor	74 (last: 81)	59 (last: 64)
Non-Dynamo	73 (last: 78)	58 (last: 61)
Dynamo	72 (last: 78)	51 (last: 52)

Models Summary (A100)

Inductor: Inference (-7, +0)
- (fail) doctr_det_predictor (likely due to newer PyTorch/benchmark commit)
- (fail) doctr_reco_predictor (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5 (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5_base (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5_large (likely due to newer PyTorch/benchmark commit)
- (fail) moco (caused by #7321)
- (fail) soft_actor_critic (likely NumPy 2.0 issue)
Inductor: Training (-5, +0)
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) moco
- (fail) soft_actor_critic
XLA:GPU (non-dynamo): Inference (-6, +1)
- (fail) doctr_det_predictor
- (fail) doctr_reco_predictor
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
- (pass) moco
XLA:GPU (non-dynamo): Training -4, +1)
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
- (pass) moco
XLA:GPU (dynamo): Inference (-6, +0)
- (fail) doctr_det_predictor
- (fail) doctr_reco_predictor
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
XLA:GPU (dynamo): Training -1, +0)
- (fail) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (July 1 ~ July 5):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 7128504424ca54311efdf22f2c8425291586860e
PyTorch/XLA commit: c782e0deccf3bf98be8c60899d25eaf7fe531dc4
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 74)	66 (last: 60)
Non-Dynamo	78 (last: 73)	64 (last: 60)
Dynamo	78 (last: 72)	55 (last: 54)

L4

	Inference	Training
Inductor	81 (last: 74)	65 (last: 59)
Non-Dynamo	78 (last: 73)	62 (last: 58)
Dynamo	78 (last: 72)	52 (last: 51)

Models Summary (A100)

Inductor: Inference (-0, +7)
- (pass) doctr_det_predictor (likely due to old PyTorch/benchmark commit)
- (pass) doctr_reco_predictor (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5 (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5_base (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5_large (likely due to old PyTorch/benchmark commit)
- (pass) moco (caused by #7598)
- (pass) soft_actor_critic (likely due to old PyTorch/benchmark commit)
Inductor: Training (-0, +6)
- (pass) dlrm
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) moco
- (pass) soft_actor_critic
XLA:GPU (non-dynamo): Inference (-1, +6)
- (pass) doctr_det_predictor
- (pass) doctr_reco_predictor
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
- (fail) moco (needs newer torchbench)
XLA:GPU (non-dynamo): Training (-1, +5)
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
- (pass) timm_nfnet (fixed in #7602)
- (fail) moco (needs newer torchbench)
XLA:GPU (dynamo): Inference (-0, +6)
- (pass) doctr_det_predictor
- (pass) doctr_reco_predictor
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
XLA:GPU (dynamo): Training -1, +0)
- (pass) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7630

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 8 ~ July 12):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: da030e7addfe94f27fb9428245b854bc93f5917f
PyTorch/XLA commit: 1651e76476d41a433f68abb5a7bdc5ab8b8eb221
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 78)	61 (last: 64)
Dynamo	75 (last: 78)	52 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	75 (last: 78)	59 (last: 62)
Dynamo	75 (last: 78)	49 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (non-dynamo): Training (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (dynamo): Inference (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (dynamo): Training (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 15 ~ July 19):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: eee76c86a8462365b7423916607b7a40bfec6f73
PyTorch/XLA commit: 35c537af8bcbd555d833c1a15443beb2dca571c2
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	78 (last: 75)	64 (last: 61)
Dynamo	78 (last: 75)	55 (last: 52)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 75)	62 (last: 59)
Dynamo	78 (last: 75)	52 (last: 49)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (non-dynamo): Training (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (dynamo): Inference (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (dynamo): Training (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 22 ~ July 26):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: c3679bed35dc282606741d6ef06d6d0a21c0cc8a
PyTorch/XLA commit: 2870e93b10d1686c752c446277dedcd210f2d11a
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	77 (last: 78)	64 (last: 64)
Dynamo	78 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	78 (last: 78)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, +0)
- (fail) doctr_reco_predictor: timeout

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (July 29 ~ Aug 9):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 50595ecef4a4f9882a02539019b11a5e50295244
PyTorch/XLA commit: 60b9dfe7e4cc950a5dd6543427c3d3eaf33dcb94
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 81)	66 (last: 66)
Non-Dynamo	78 (last: 77)	63 (last: 64)
Dynamo	77 (last: 78)	52 (last: 55)

L4

	Inference	Training
Inductor	77 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 78)	45 (last: 52)

Models Summary (A100)

Inductor: Inference (-4, +0)
- (fail) cm3leon_generate (likely due to CUDAGraphs introduction #7749)
- (fail) hf_T5_generate (likely due to CUDAGraphs introduction #7749)
- (fail) llama (likely due to CUDAGraphs introduction #7749)
- (fail) maml (likely due to CUDAGraphs introduction #7749)
XLA:GPU (dynamo): Inference (-1, +0)
- (fail) hf_BigBird
XLA:GPU (dynamo): Training (-4, +0)
- (fail) Background_Matting: OOM
- (fail) hf_BigBird
- (fail) timm_nfnet: OOM

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (Aug 12 ~ Aug 16):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 99b3b58f39507bb8ad5b4bb1b9bedf7f47b64fa3
PyTorch/XLA commit: 0e35022f01a1bb89aff83f578dc62122a6a90d33
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 77)	44 (last: 45)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7878

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (Aug 19 ~ Aug 23):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 2553278bae5993bd94bae4f04bf4586fb3f30d57
PyTorch/XLA commit: 5f82da90e744b9c8da8690a0f4cc269f7fa474c9
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 52)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 77)	41 (last: 44)

Models Summary (A100)

XLA:GPU (dynamo): Training (-4, +1)
- (fail) basic_gnn_edgecnn
- (fail) basic_gnn_gin
- (fail) basic_gnn_sage
- (fail) stable_diffusion_text_encoder
- (pass) dlrm

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7878

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/pytorch/issues/133924

ysiraichi commented 1 month ago

Weekly update (Aug 26 ~ Aug 30):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 29b7852dc1a85b36688716e27ac3ce0fa06c4b84
PyTorch/XLA commit: 13affb9d5e5c9aabed033980fd678cfe218b2091
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 63)
Dynamo	77 (last: 77)	51 (last: 49)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 62)
Dynamo	77 (last: 77)	48 (last: 41)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-0, +1)
- (pass) dlrm
XLA:GPU (dynamo): Training (-0, +2)
- (pass) Background_Matting
- (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7918

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 weeks ago

Weekly update (Sep 2 ~ Sep 6):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: b143426db3910b8753255a034250ac0c9ea40aa3
PyTorch/XLA commit: 12e595846015505f532401ff149a6a0dd834add6
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 51)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 48)

Models Summary (A100)

XLA:GPU (dynamo): Training (-0, +1)
- (pass) nvidia_deeprecommender

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7918

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/pytorch/pull/135237

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7976

ysiraichi commented 2 weeks ago

Weekly update (Sep 9 ~ Sep 13):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: b4c84c31679286080842236a7b1de8e8339a6963
PyTorch/XLA commit: 9c7f08355cca94f8962c2f382b9d45665171b19e
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	79 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	79 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 49)

Models Summary (A100)

Inductor: Inference (-0, +1)
- (pass) cm3leon_generage
- (pass) hf_T5_generage

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/8006

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7983

ysiraichi commented 1 week ago

Weekly update (Sep 16 ~ Sep 20):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: cf31724db726ad210fc6638f9873e041c33c9034
PyTorch/XLA commit: d0ea5cc202a57d90ae2d7082b02f7f2f2d29eed4
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	79 (last: 79)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	79 (last: 79)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 49)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7998

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7983

pytorch / xla