uskudnik / amazon-glacier-cmd-interface

Command line interface for Amazon Glacier
MIT License
374 stars 100 forks source link

option to disallow more than one archive retrieve at time #28

Open jose1711 opened 11 years ago

jose1711 commented 11 years ago

retrieves from glacier could become very expensive: http://www.innerexception.com/2012/08/is-amazon-glacier-really-as-cheap-as-it.html you can mitigate the costs by splitting your archive into multiple smaller archives and requesting no more than one at a time. splitting is easy (and relatively cheap at 0.01 GB/mo) but if you're not careful you can still request - say - all the archives to be retrieved at once. i think having an extra fuse in amazon-glacier-cmd-interface that would watch when was the last retrieval made would be welcome for users who very much care about keeping peak-hours as low as possible.

wvmarle commented 11 years ago

Is this a wanted feature, really? I understand the concern, but is this something to be implemented, and if so how, or should it simply be left to the user to watch their step when retrieving archives?

uskudnik commented 11 years ago

I had the same questions regarding this and haven't made up my mind, neither regarding the necessity of the option nor how to implement it...

@jose1711 Do you really think this a huge problem? How would you like to have this implemented? If anything has been requested already display a one line warning? Should we request confirmation that you want to really request it? Have an option in the settings that would block and prevent multiple retrievals if set to true?

jose1711 commented 11 years ago

i was thinking.. store somewhere the timestamp of last retrieval request and if the timestamp value + minimal interval between retrievals < current time, return error..

uskudnik commented 11 years ago

That would surely not be acceptable as i imagine a more frequent occurrence when people would want to issue several retrieval operations at once... If anything, this must be either an option or at best a warning with a question.

As for algorithm, i guess its ok, but would need to verify it. On 10 Oct 2012 20:43, "jose1711" notifications@github.com wrote:

i was thinking.. store somewhere the timestamp of last retrieval request and if the timestamp value + minimal interval between retrievals < current time, return error..

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/28#issuecomment-9314532.

jose1711 commented 11 years ago

this was just an idea. besides, i am not saying anywhere that the default (mininterval) should not be zero. prompt is not a bad thing either..

wvmarle commented 11 years ago

Typical use case of Glacier is as a "cold storage" facility. The backup of the backup. When everything goes wrong, it's great to have your data there and cost suddenly is less important.

Instead of a question I'd prefer to throw an error, with in the message something like warning you are exceeding your free limit; use --force to override this check. So add the --force command line switch instead of asking a question. That's in line with the rest of the operation of glacier-cmd.

Back to retrieval: the free amount is (if I understand it all correctly) 5% of your total storage per month, equally divided per day, so if you were to store 100 GB in Glacier you can download 166 MB per day free. But if you have more storage, you can download more. And then Glacier also looks at an hourly retrieval rate - billing is related to the hour you downloaded most of the data. So it's quite complex all in all. Very hard to decide on what exactly the download limits are (you must have accurate information on the amount of data stored in the account), even harder to check on them. At least I can't think of a reasonably easy and reliable way to do such checks.

wvmarle commented 11 years ago

@jose1711 ideas are good, always. I always say, the crazier the better. And maybe an idea is crazy, but it can very well spark other not so crazy ideas.

uskudnik commented 11 years ago

--force is OK.

But as @wvmarle said calculating everything correctly could be a bit tricky (http://aws.amazon.com/glacier/faqs/#How_much_does_Amazon_Glacier_cost). If we can't make it very reliable I'm not even comfortable doing it unless we decide to err on the side of caution.

Will think a bit more about it.

wvmarle commented 11 years ago

The main trick is to get the free download tier.

Probably the biggest problem is "how much is in the vaults of this account?". Is it "how much is there right now?" or "how much is there average over this month/billing period?" The first is relatively easy to get to (albeit with a 4-hour delay to wait for an inventory job to finish), the second is way harder.

jose1711 commented 11 years ago

could be slightly OT: this should help you plan/calculate the costs for backup retrievals: https://docs.google.com/spreadsheet/ccc?key=0Al87cCkTI-7adFVxd213UFNpcXo5RzNoVlFRbTdoVGc

wvmarle commented 11 years ago

It would be great if Amazon would put out something like that. This is interesting, but the problem is, it's still based on some person's interpretation of the fee schedule by Amazon (which is not easy to understand). I have simply no idea how reliable/accurate the info coming out of that spreadsheet is. And as @uskudnik said, if we can't make it reliable, better not do it at all.