responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
167 stars 34 forks source link

Test failures – Discrepancy between incident page count and GraphQL endpoint count #2615

Closed cesarvarela closed 4 months ago

cesarvarela commented 5 months ago

This has been happening in multiple environments:

https://github.com/responsible-ai-collaborative/aiid/actions/runs/7736942933/job/21116835017#step:7:481

https://github.com/aiidtest/aiid/actions/runs/7750886781/job/21138036135#step:7:285

https://github.com/responsible-ai-collaborative/aiid/actions/runs/7780225084/job/21229566391

Current guesses:

cesarvarela commented 4 months ago

It's been a while since this last happened.

kepae commented 4 months ago

Unfortunately I think we're still seeing this error, happened in the past two automatic cron workflows to re-build prod: https://github.com/responsible-ai-collaborative/aiid/actions/runs/8023221999

I manually pushed today to get updates, skipping the cache.

kepae commented 4 months ago

Could be a coincidence still, but manual dispatches succeed where automatic ones are failing. All recent failures are due to this same test failure.

Screenshot 2024-02-27 at 11 59 29 AM

All manual runs were set to skip the cache, but the production job does that too right now. https://github.com/responsible-ai-collaborative/aiid/blob/152c5c048e3cca30b6f571a25d6e155aa829a750/.github/workflows/production.yml#L14

cesarvarela commented 4 months ago

The issue is that for scheduled deploys, since the commit sha doesn't change, it uses the first cached build of that sha, and any new incident after that first build is not there, obviously 💥

I'll update the workflows to take this into account, most probably checking that a new cache key needs to be created if it is a scheduled run.

image
kepae commented 4 months ago

LGTM now!