Closed gregtatum closed 4 days ago
Medium resource languages use 300-500 GB from what I've looked at. Using the public pricing at https://cloud.google.com/storage/pricing#north-america
$0.023 GB/month for 12 months is: $82-$138
Most of the size are the copies of the dataset pipeline.
en-cs: DtSyAeaVRoGNZDnUKscGWw
en-fi - bNBrAkLqQpCpuxfMe3I-mw
In one of our big spring-2024 runs that went end-to-end we should write a script to get all the tasks from the Taskcluster API. Then iterate over them all, fetch the artifacts for each task, then compute the total size of them to see what's going on.