responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
168 stars 35 forks source link

Get Database Snapshots from the new Cloudflare R2 bucket #2374

Closed pdcp1 closed 9 months ago

pdcp1 commented 11 months ago

We should change the code of this page https://incidentdatabase.ai/research/snapshots/ to list all DB snapshots from the new Cloudflare R2 bucket instead of the current AWS S3.

kepae commented 9 months ago

I'm downloading all existing backups and will manually copy them to the R2 buckets. When we move this over, we can also squash https://github.com/responsible-ai-collaborative/aiid/issues/2017.

smcgregor commented 9 months ago

LMK if you need support on the downloading.

kepae commented 9 months ago

@pdcp1 backup history from S3 has been copied to aiid-public-backups, let me know if you need anything else and sorry for the delay!

pdcp1 commented 9 months ago

Tested on Staging with Production Cloudflare R2 variables โœ… https://staging-aiid.netlify.app/research/snapshots/ Finally ๐ŸŽ‰

kepae commented 9 months ago

Finally indeed ๐ŸŽ‰

One side effect โ€“ the dates of the downloads are read from the bucket file metadata and not inferred from the filename. This is logical, but means that the history of all snapshots I copied is "erased", as all items share the same date of Dec 8:

image

We won't block on this, but this is confusing for a user looking for a certain date.

Perhaps we should just remove the human-readable string in the front, since we display the filename containing the date string: <filename> ยท <size>

What do you think, @pdcp1?

(also, I updated the production Netlify values as well, so this can go to prod)

pdcp1 commented 9 months ago

@kepae You're right about the dates, I didn't notice the 2023-12-08 file cases. I prefer to keep the human-readable string date, considering I'm unsure if everyone will notice that the date is "encoded" in this 14-digit string.

Something easy to do is to parse the 14-digit string from the file name and convert it to a Date. Doing that, both dates will match. I have to consider the timezone but it's not a big deal.

PS: Thanks for updating the Production Netlify values in advance!