Closed tskir closed 3 months ago
Go for it! Thanks for looking into it
cc @prashantuniyal02
Thank you for reviewing this quickly @d0choa! In this case I'll proceed with setting up the bucket as Autoclass
Vault set up. All details, including dataset ingestion instructions, are going to be stored in a private repository (will circulate in a Slack channel).
As we discussed, we want to mirror certain datasets into a vault bucket in our Google Storage. Because we're talking about substantial amounts of data, we need to decide how to manage storage classes to minimise costs.
Options
We really only have two:
1. Archive storage + manual retrieval
In this scenario, the vault data is stored in the Archive class from the beginning. If we want to use the data, we manually copy it into a staging bucket (let's say for a month) and perform the work we need. Data storage is cheap, but we incur huge retrieval charges for every time we want to access it.
2. Autoclass storage
In this scenario, the entire vault bucket is configured using the Autoclass feature. All data starts in standard storage and progressively sinks into colder storage if it's not accessed:
With Autoclass, there are no retrieval charges (even from cold storage) and no early deletion charges.
Cost comparison
Assuming the total size of the data is 50 TB:
Essentially, if the data is ever retrieved, Autoclass costs pretty much the same as the Archive storage with manual retrieval. However, for data which is literally never (not once) retrieved during its lifetime, Archive storage is of course cheaper.
Spreadsheet with the calculations.
Recommendation
I am strongly leaning towards using Autoclass, because it just takes a lot of headache out of storage admin. There's no need to maintain a staging bucket for data, to manually copy it out, or remember to maintain special precautions to not accidentally incur a big retrieval charge.
Furthermore, storage classes are maintained on a per-object basis, so if only specific subsets of the datasets are accessed (for example, specific ancestries), it doesn't affect the rest of them.
@d0choa I'd like to hear your opinion and to make the final decision on this.