pytorch / test-infra

This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic to track disabled tests and slow tests, as well as our continuation integration jobs HUD/dashboard.
https://hud.pytorch.org/
Other
84 stars 87 forks source link

Oncall alerting when nightly job fails #6026

Open ezyang opened 1 month ago

ezyang commented 1 month ago

The pt2_stack_for_oss oncall is responsible for monitoring benchmark results. If a nightly benchmark run fails to run, the oncall should be immediately notified (e.g., ideally by work chat)