pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.46k stars 469 forks source link

Failing Torchbench Models: tracking issue #5932

Open ysiraichi opened 10 months ago

ysiraichi commented 10 months ago

Summary of Contributions (9th Feb)

1) Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

|            | Inference | Training |
|------------|-----------|----------|
| Inductor    | 87 | 63 |
| Dynamo     | 60 to 82  | 41 to 53 |
| Non-Dynamo | 79 to 82  | 54 to 56 |

2) Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

Current State

This post has two lists:

Each of them shows the failing models:

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

Inference

Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)

Dynamo+openxla. 78/81 - 96% (against inductor)

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

Inference Failing on Inductor CUDA with Different Errors

Training

Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)

Dynamo+openxla. Pass rate: 55/66 - 83% (against inductor)

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

Training Failing on Inductor CUDA with Different Errors

cc @JackCaoG @miladm

lezcano commented 10 months ago

State after 7 weeks of work:

Models fixed so far:

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 9 months ago

Weekly update (Dec 1~Dec 10):

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 9 months ago

Weekly update (Dec 11~Dec 15):

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented 8 months ago

Can we please add a pass rate table in the weekly report that includes:

Inference

Training

ysiraichi commented 8 months ago

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 91 64
Non-Dynamo 87 67
Dynamo 86 57

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 8 months ago

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 85 62
Non-Dynamo 70 57
Dynamo 71 55

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented 8 months ago

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

ysiraichi commented 8 months ago

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 88 63
Non-Dynamo 69 57
Dynamo 72 55

Models fixed:

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 8 months ago

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 88) 63
Non-Dynamo 82 (last: 69) 56 (last: 57)
Dynamo 82 (last: 72) 53 (last: 55)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 82 49

Models Summary (for A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 87) 63
Non-Dynamo 82 (last: 82) 57 (last: 56)
Dynamo 84 (last: 82) 53 (last: 53)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 84 49

Models Summary


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 12 ~ Feb 16):

Pass rate (out of 99 benchmarks):

Could not run the benchmarks this time, due to a compilation issue: #6564


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Weekly update (Feb 19 ~ Feb 23):

Pass rate (out of 99 benchmarks):

There was an error in the benchmarking scripts, making it so we were unable to run using XLA: https://github.com/pytorch/xla/pull/6612


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 7 months ago

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 87) 65 (last: 63)
Non-Dynamo 72 (last: 82) 61 (last: 57)
Dynamo 73 (last: 84) 54 (last: 53)

L4

Inference Training
Inductor 81 (last: 86) 62 (last: 60)
Non-Dynamo 71 (last: 81) 57 (last: 53)
Dynamo 73 (last: 84) 52 (last: 49)

Models Summary

ysiraichi commented 7 months ago

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 73 (last: 73) 56 (last: 54)

L4

Inference Training
Inductor 81 (last: 81) 63 (last: 62)
Non-Dynamo 72 (last: 71) 58 (last: 57)
Dynamo 71 (last: 73) 54 (last: 52)

Models Summary


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 6 months ago

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 71 (last: 71) 57 (last: 56)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 72 (last: 72) 58 (last: 58)
Dynamo 71 (last: 71) 55 (last: 54)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 6 months ago

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 37 (last: 72) 28 (last: 61)
Dynamo 31 (last: 71) 18 (last: 57)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 45 (last: 72) 38 (last: 58)
Dynamo 44 (last: 71) 22 (last: 55)

Models Summary (A100)

No summary this week because:


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

vanbasten23 commented 6 months ago

@ysiraichi The regression you saw might be due to https://github.com/pytorch/xla/pull/6677 (open xla pin update). Our team is looking into this issue.

ysiraichi commented 6 months ago

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 72) 64 (last: 61)
Dynamo 73 (last: 71) 58 (last: 57)

L4

Inference Training
Inductor 80 (last: 81) 64 (last: 64)
Non-Dynamo 76 (last: 72) 61 (last: 58)
Dynamo 74 (last: 71) 56 (last: 55)

Models Summary (A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented 6 months ago

Last week, the results were unchanged. We are preparing for performance optimizations. cc @ysiraichi

ysiraichi commented 6 months ago

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 76) 63 (last: 64)
Dynamo 73 (last: 73) 53 (last: 58)

L4

Inference Training
Inductor 82 (last: 80) 65 (last: 64)
Non-Dynamo 75 (last: 76) 61 (last: 61)
Dynamo 74 (last: 74) 51 (last: 56)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 74 (last: 75) 64 (last: 63)
Dynamo 74 (last: 73) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 75 (last: 75) 61 (last: 61)
Dynamo 75 (last: 74) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor ? (last: 81) ? (last: 66)
Non-Dynamo ? (last: 74) ? (last: 64)
Dynamo ? (last: 74) ? (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 75) 61 (last: 61)
Dynamo 76 (last: 75) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 74) 64 (last: 64)
Dynamo 75 (last: 74) 53 (last: 53)

L4

Inference Training
Inductor 81 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 5 months ago

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 81) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Notes

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 76) 61 (last: 64)
Dynamo 78 (last: 75) 55 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 76) 59 (last: 61)
Dynamo 78 (last: 76) 52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 77) 63 (last: 61)
Dynamo 78 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 77) 61 (last: 59)
Dynamo 78 (last: 78) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 4 months ago

Weekly update (May 27 ~ May 29):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 66)
Non-Dynamo 79 (last: 77) 61 (last: 63)
Dynamo 79 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 65)
Non-Dynamo 79 (last: 77) 60 (last: 61)
Dynamo 79 (last: 78) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 79 (last: 79) 63 (last: 61)
Dynamo 79 (last: 79) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 64)
Non-Dynamo 79 (last: 79) 61 (last: 60)
Dynamo 79 (last: 79) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 17 ~ June 21):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 82) 65 (last: 65)
Non-Dynamo 78 (last: 79) 63 (last: 63)
Dynamo 78 (last: 79) 55 (last: 55)

L4

Inference Training
Inductor 81 (last: 82) 64 (last: 64)
Non-Dynamo 78 (last: 79) 61 (last: 61)
Dynamo 78 (last: 79) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (June 24 ~ June 28):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 74 (last: 81) 60 (last: 65)
Non-Dynamo 73 (last: 78) 60 (last: 63)
Dynamo 72 (last: 78) 54 (last: 55)

L4

Inference Training
Inductor 74 (last: 81) 59 (last: 64)
Non-Dynamo 73 (last: 78) 58 (last: 61)
Dynamo 72 (last: 78) 51 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 months ago

Weekly update (July 1 ~ July 5):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 74) 66 (last: 60)
Non-Dynamo 78 (last: 73) 64 (last: 60)
Dynamo 78 (last: 72) 55 (last: 54)

L4

Inference Training
Inductor 81 (last: 74) 65 (last: 59)
Non-Dynamo 78 (last: 73) 62 (last: 58)
Dynamo 78 (last: 72) 52 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 8 ~ July 12):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 78) 61 (last: 64)
Dynamo 75 (last: 78) 52 (last: 55)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 75 (last: 78) 59 (last: 62)
Dynamo 75 (last: 78) 49 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 15 ~ July 19):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 78 (last: 75) 64 (last: 61)
Dynamo 78 (last: 75) 55 (last: 52)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 75) 62 (last: 59)
Dynamo 78 (last: 75) 52 (last: 49)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 months ago

Weekly update (July 22 ~ July 26):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 77 (last: 78) 64 (last: 64)
Dynamo 78 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 78 (last: 78) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (July 29 ~ Aug 9):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 81) 66 (last: 66)
Non-Dynamo 78 (last: 77) 63 (last: 64)
Dynamo 77 (last: 78) 52 (last: 55)

L4

Inference Training
Inductor 77 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 77 (last: 78) 45 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (Aug 12 ~ Aug 16):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 52 (last: 52)

L4

Inference Training
Inductor 77 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 77 (last: 77) 44 (last: 45)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (Aug 19 ~ Aug 23):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 49 (last: 52)

L4

Inference Training
Inductor 77 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 77 (last: 77) 41 (last: 44)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 month ago

Weekly update (Aug 26 ~ Aug 30):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 64 (last: 63)
Dynamo 77 (last: 77) 51 (last: 49)

L4

Inference Training
Inductor 77 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 63 (last: 62)
Dynamo 77 (last: 77) 48 (last: 41)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 3 weeks ago

Weekly update (Sep 2 ~ Sep 6):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 64 (last: 64)
Dynamo 77 (last: 77) 52 (last: 51)

L4

Inference Training
Inductor 77 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 49 (last: 48)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 2 weeks ago

Weekly update (Sep 9 ~ Sep 13):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 79 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 64 (last: 64)
Dynamo 77 (last: 77) 52 (last: 52)

L4

Inference Training
Inductor 79 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 49 (last: 49)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented 1 week ago

Weekly update (Sep 16 ~ Sep 20):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 79 (last: 79) 66 (last: 66)
Non-Dynamo 78 (last: 78) 64 (last: 64)
Dynamo 77 (last: 77) 52 (last: 52)

L4

Inference Training
Inductor 79 (last: 79) 65 (last: 65)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 49 (last: 49)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]