KernelCI artifacts storage improvement

nuclearcat commented 5 months ago

Short summary:

This proposal is about reducing costs for storage, and increasing retention period for data with half of the budget. :)

Details:

At current moment we are using for production storage managed block device 4TB. Cost is approx $450/month, which means approx $0.11/GB/month if we use it fully. Right now we are using 69% of capacity, which means we are paying approx $0.16/GB/month. Also problem we are facing that we are running out of space, and we need to extend it, which is painful from sysadmin POV and engineering hour costs.

Project generating in average 40GB of short-living data per day. (kernel artifacts) which is about up to 280GB per week, and 1.2TB per month. We have also long-term stored items (rootfs): 500-1000GB.

We are looking for a cheaper solution and reduce budget for storage twice.

I propose following:

Initial budget: $225/mo

For hot storage we will still use block device. Current rootfs images, and other hot data will be stored there. We might fit in 1TB of storage, which will cost us $122.8/month (P30) Instead of fixed term cleanup policy we can implement dynamic cleanup policy, which will MOVE old data when we are running out of space, with notification if we are cleaning too fresh data.

Remaining budget: $102.2/mo

Option A:

Data will be moved Azure Blob Storage (supports NFS and SFTP) "Cool" storage. Cost of storage is $0.01/GB/month, which means we can store 10TB of data for $100/month. That means with 40GB/day we can store data with 8 months retention.

Outcome:

We will reduce budget for storage twice
Instead of current 2 weeks retention we will have 8 months retention, but with mandatory authentication to access data, as retrieval cost exist.

Option B:

Data will be moved to Azure Blob Storage "Archive" storage. Which is very cheap, but it might take up to 15 hours to retrieve data. Cost of storage is $0.00099 per GB/month, which means we can store 100TB of data for $99/month. That means with 40GB/day we can store data with 20 months retention.

Outcome is almost the same as in Option A, but with longer retention period.

We will reduce budget for storage twice
Instead of current 2 weeks retention we will have 20 months retention, but with mandatory authentication to access data, as retrieval cost exist.

Option C: Hybrid scheme, where we might keep some files in "Cool" storage, and some in "Archive" storage. More likely that kernel developers will need artifacts from last 2-3 months, and older data will be needed rarely, for example in debugging some exceptional cases (for example kernel that was working 1 year ago, on rebuilding same doesnt work anymore).

gctucker commented 5 months ago

Has there been any actual solution design already for how to manage storage in production with the new API & Pipeline?

nuclearcat commented 5 months ago

We are using Azure Files as before, at moment, and as discussed at meetings we will keep using it, unless we face significant issues. If necessary we can migrate to better storage later on, it is not a priority right now.

gctucker commented 5 months ago

Right, so still no documented design for production. I was just intrigued by the word "improvement" as you can only improve based on something that exists. Are there actual good reasons to stick with Azure Files or is it just due to lack of time to research alternatives?

nuclearcat commented 5 months ago

There are indeed several options for storing files in the project, and they do exist, and the project is somewhat far from being called “production”. At the moment we are discussing an option that MAY be used in production.

kernelci / kernelci-project

KernelCI artifacts storage improvement #310