Closed DomAyre closed 1 month ago
Here's a manual run of the region workload to see if failures in heavy-io cause whole workflow failure https://github.com/microsoft/confidential-aci-dashboard/actions/runs/10846966157
Here's a manual run of the individual workflow https://github.com/microsoft/confidential-aci-dashboard/actions/runs/10847111652
This solution works because historical results are tracked on a per job basis rather than per workflow. So the job still shows as failed even if the workflow reports success.
I have manually tested this running ./scripts/get_workload_region_results.sh
with the individual workflow run mentioned above (I modified the script to show results not on main):
@DomAyre ➜ /workspaces/confidential-aci-dashboard (unblock-region-workloads) $ ./scripts/get_workload_region_results.sh heavy-io westeurope 2024-09-13T10:01:00Z
Getting results for:
Workload: heavy-io
Region: westeurope
Since: 2024-09-13T10:01:00Z
Conclusion,URL,Date,Failing Step
✗ Failure,https://github.com/microsoft/confidential-aci-dashboard/actions/runs/10847111652/job/30101497003,2024-09-13T10:05:44Z,None
The Heavy IO workload exposes a pretty consistent issue in Confidential ACI, albeit a niche one. Therefore we would like to continue running this workload to have visibility but we don't really want to make the whole dashboard red because of this.
To solve this, we want to run the workload, but not fail the caller 'region' workflow if it fails. This allows us to continue collecting data but without red-ing out our dashboard. We also need to ensure that it doesn't make the workflow entirely green