Open kallisti5 opened 1 month ago
My preference:
I've reached out to wasabi to try and get "actual" bandwidth utilization numbers. They don't publish it in our portal (but I sure as hell know they look at it since they have cut us off before due to egress)
Why do you have Backblaze as "20-25 a month"? If we factor in the CDN with free egress then shouldn't it be storage costs only, and thus be equivalent to Wasabi + CDN?
Why do you have Backblaze as "20-25 a month"? If we factor in the CDN with free egress then shouldn't it be storage costs only, and thus be equivalent to Wasabi + CDN?
"Free egress up to 3x their average monthly storage amount. Egress over average stored is $0.01/GiB. ~2TiB - 400GiB = 1600GiB * 0.01 = $16 / month" 16 + 6 = $21~
EDIT: I did that math wrong. Lets re-run the cost numbers. Assuming 6TiB egress, and 400GiB storage.
Backblaze:
Storj:
Wasabi:
Telnyx:
Backblaze + bunny.net CDN seems like the best deal tbh with controlled risk. The Bunny.net CDN could cut that 6TiB way down to a "a few TiB or less" on all providers, but it's an unknown how efficient their caching is in our use-case
EDIT - Actual worst-case egress bandwidth numbers:
For me, while I think reliability is important, it is not the end of the world if we get cut off and need to relocate. However, how do we keep control of our packages? I.e. is there going to be a backup or a primary source for them? Another factor is the odds of hidden surprises, i.e. I do not want to be surprised by a sudden change of rates if we cross some sort of threshold, so any provider with a 'flat' rate that scales linearly is preferred over a provider that requires us to closely monitor some sort of threshold. Finally, I would keep things as easy as possible, so the Digital Ocean Spaces where we will need to do additional data design is off the table for me.
For me, while I think reliability is important, it is not the end of the world if we get cut off and need to relocate. However, how do we keep control of our packages? I.e. is there going to be a backup or a primary source for them?
The nice thing about s3 is it actually gets easier to back things up. Today we have the automatic "compress all the artifacts, encrypt them, and upload to an s3 bucket" backup system. That doesn't work for huge things though since I really don't want to work with 300GiB tar delta's :sweat_smile:
In the model where some object storage provider is the source of truth, we really just need to rclone the bucket "somewhere" else. Historically i've just rcloned to a dedicated bit of local storage at my house as a cold backup (you could do the same). rclone works off of deltas like rsync, so it's bandwidth consumption friendly after the initial clone.
rclone also lets you sync between storage providers... and it supports a TON
We actually have an rclone container today ready to go that will do that to storj. We can make some fixes though to make it more generic.
I also have rclonefs which will (theoretically) let us mount s3 buckets as fuse storage mounts on each k8s node so we can (theoretically) offer s3 buckets over rsync to mirrors from pods running on any k8s node. (fuse in k8s is weird though, and we need elevated security context).
Another factor is the odds of hidden surprises, i.e. I do not want to be surprised by a sudden change of rates if we cross some sort of threshold, so any provider with a 'flat' rate that scales linearly is preferred over a provider that requires us to closely monitor some sort of threshold.
Agree. Definitely the biggest pain point of object storage. I really like the pricing of Telnyx, but the whole "per million API hits" thing makes me nervous on something complex and large like haikuports.
Finally, I would keep things as easy as possible, so the Digital Ocean Spaces where we will need to do additional data design is off the table for me.
Agree. Lets strike DO off the list. They had some appealing things to them, but needing a whole gaggle of buckets to groom to get reasonable pricing is too much lift. I'm tired of forming infrastructure "around" providers weird limitations.
I updated https://github.com/haiku/infrastructure/issues/141#issuecomment-2383633257 with the pricing based on the actual worst case bandwidth numbers I saw on digital ocean.
Oh, and I just looked at the Wasabi bill.. it does list "908.40 API requests" for the month. I'm guessing that's 1000's though given the decimal point.. so 908,400 makes more sense.
Here's some data on the bunny.net cdn. It definitely cuts down our bandwidth usage ~50% on a single haiku nightly repo.
I'm sure the savings will be less for haikuports (more random packages, etc)
Looks like the preferred is backblaze + bunny then?
Agree. I think backblaze + bunny are going to be the cheapest combo. Bunny will cut down the xfer 50%, so that $30 / month should be "worst case"
Ryan went ahead and entered our billing info. I went ahead and deployed a temporary VM @ digital ocean to use to shovel artifacts over to backblaze.
I'm going to start with the Haiku repos themselves since it's an easy (smaller) test of data before moving on to haikuports.
~Aaaand.. Backblaze just crapped the bed.~
API calls are NOT free.
EDIT: I guess +$4-8 a month extra for API calls isn't horrible... however it adds risk to Backblaze.
API calls are NOT free.
EDIT: I guess +$4-8 a month extra for API calls isn't horrible... however it adds risk to Backblaze.
That's not great and definitely false advertising...
I went ahead and put the haiku repo over onto backblaze. We already blew past the "free tier" of class C api calls during the last sync. :face_exhaling:
I'm about to head out of town and will be back Sunday.. so here are important facts:
If the :hankey: hits the fan, you can take the following actions to undo the migration to backblaze:
With haikuporter's support of s3, we need to choose a object storage provider. For context, this will be replacing our Digital Ocean volume block attachment which is $25 month / 250GiB
Assuming ~400GiB stored... 2TiB of egress a month (which gives us a lot of head room) Assuming 35 million API ops a month (17M Class A, 17M Class B)
$16-35 / month likely as we grow. Risk of pulling too much egress and getting shut off.
Likely $11-20 / month, $ per API operations a big risk. Haikuporter, hpkgbouncer, all hit APIs
Likely $12-20 / month
$16-24 / month
~$23 per month for 400GiB + 2TiB xfer
~ We don't like having to have multiple buckets to get reasonable pricing.Notes: We don't have to go all-in on a single S3 provider. Haiku can remain at wasabi, haikuports can be "where ever". We can run one deployment of hpkgbouncer per repo.