pytorch / test-infra

This repository hosts code that supports the testing infrastructure for the main PyTorch repo. For example, this repo hosts the logic to track disabled tests and slow tests, as well as our continuation integration jobs HUD/dashboard.
https://hud.pytorch.org/
Other
78 stars 75 forks source link

ROCm test artifacts not being published on HUD #5298

Open jithunnair-amd opened 3 months ago

jithunnair-amd commented 3 months ago

If a commit has mem_leak_check and rerun_disabled_tests jobs running along with the regular trunk workflows jobs, we find that the HUD page for the commit doesn't list the artifacts for the regular trunk workflow. E.g. https://hud.pytorch.org/pytorch/pytorch/commit/cf77e7dd9770caf65e898ac2ee82045aa0408e30#rocm The linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2 jobs have a link for artifacts for the mem_leak_check and rerun_disabled_tests variants, but not for the regular one.

image

However, from the corresponding Github Actions page for the regular job, https://github.com/pytorch/pytorch/actions/runs/9363246389/job/25774767756#step:19:51, it looks like the artifact was succesfully uploaded to https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/9363246389/1/artifact/test-jsons-test-default-1-6-linux.rocm.gpu.2_25774767756.zip

clee2000 commented 3 months ago

For whomever ends up taking it up, its because we show multiple workflow runs in a box but only query for the artifacts of one workflow id

huydhn commented 3 months ago

AI: Need to double check if this is a ROCM-thing or if this is a wide spread issue with other runners too