rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
47.08k stars 4.21k forks source link

[Feature Request] Amazon Glacier Support #923

Closed cemilbrowne closed 5 years ago

cemilbrowne commented 7 years ago

Hi,

Filing a feature request to support Amazon Glacier. Given the pricing, this seems like an ideal rclone target - $0.007/GB is pretty decent...

Thanks! -Cemil

thibaultmol commented 7 years ago

Suprised it isn't supported yet. (I would like to mention that @Backblaze is $0.005/GB 😃 (I don't work for Backblaze) )

itanoss commented 7 years ago

+1

solars commented 7 years ago

+1 thought it is already there, but now that I need it it isn't :(

ncw commented 7 years ago

What would rclone need to do to support this? Set a life cycle policy?

hashbackup commented 7 years ago

I believe it would be very difficult for rclone to support Glacier, the primary reason being that Glacier does not have the concept of pathnames or filenames: when data is uploaded to Glacier, a unique archive ID is generated by Glacier and that ID is the only way to access the file. On top of that, the 4-hour transaction delay is maddening and the obtuse (that's generous) cost calculations require very careful study. Here's a link to a document detailing why I removed Glacier support from HashBackup:

http://www.hashbackup.com/technical/glacier-eol

Some of this has changed recently as Amazon has made it possible to access Glacier without the 4-hour delay, but many of the other points are still valid. I think when compared to S3 Infrequent Access and Google Nearline, Glacier has very few advantages.

ncw commented 7 years ago

@hashbackup thanks for the writeup - very interesting.

So it looks like S3 Infrequent Access would be the way to go. It looks like that would be easy to add - what do you think?

hashbackup commented 7 years ago

Yeah, IA is very easy to add: it's just a storage class header on the upload and other S3 operations aren't affected. RRS (Reduced Redundancy Storage) is also a storage class option, but I'd avoid it because according to Amazon, it statistically loses 1 file in 10K per year, and from the way they have it listed on their website, I believe they will eventually deprecate it, like Google is doing with DRA (Durable Reduced Availability).

Other gotchas: S3 IA rounds up every file size to 128K, so using the cost difference between IA and regular S3, you don't want to use IA for any files below 53400 bytes because it will be more expensive than regular S3. There is also a 30-day minimum storage period with IA: if you delete an object before it has been stored 30 days, you're still charged for 30 days. There's not a good way to optimize for this because most of the time the software layer between the storage service and user data doesn't know how long a file will be stored.

Google also has a 30-day delete penalty with Nearline, and with Coldline, it's 90 days. In my opinion, these complex pricing strategies are gimmicks that trick people into paying more in storage costs. For example, Coldline is 3.7x cheaper per month vs regular storage costs, but it could cost 90x more to store a short-lived object if you're not careful.

Google Class A operation costs are also double with Nearline and Coldline, but at least there is no file size minimum as with S3 IA. Class B operations are more than double with Nearline, and 12x more with Coldline. With HashBackup, this isn't a big deal because it is very stingy with operations, but for something like rclone where you are doing a lot of directory listings and transferring a lot of individual files, it could matter.

On 1/27/17, Nick Craig-Wood notifications@github.com wrote:

@hashbackup thanks for the writeup - very interesting.

So it looks like S3 Infrequent Access would be the way to go. It looks like that would be easy to add - what do you think?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ncw/rclone/issues/923#issuecomment-275777038

-- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com

marrie commented 7 years ago

I also think glacier would be a good idea, especially since I believe they lowered it to 0.004 except in California then it's the same price as b2 which is 0.005. Not much of a savings by the way, I did a comparison. 13t at b2 is $60 a month the same at glacier is 56. Still though I think this would be a good idea. Any plans to make this happen? I filed a duplicate bug which I will reference, I hope. What's #the status on this? I filed it under 1038

I can't speak to how hard it would be, but the 4 hour delay could be worth it especially as you won't be able to download all 13t in one sitting anyway. It would be another option a user has anyway.

vsoch commented 7 years ago

+1 here, I primarily just need some programmatic way to initiate restore (and then 4+ hours later) do a transfer from S3 to elsewhere. Putting a large amount of data in glacier years ago was a terrible idea.

DerSalvador commented 7 years ago

+1 please as backup solution do S3 and others will be amazing

fullofcaffeine commented 7 years ago

+1

spurll commented 6 years ago

+1

AlexandreGohier commented 5 years ago

It now works!!

You can either:

rclone copy myfile.txt regular_S3_remote:mybucker/myfolder/ --s3-storage-class GLACIER

This is now possible following the update of the S3 PUT API

With the S3 PUT API, you can now upload objects directly to the S3 Glacier storage class without having to manage zero-day lifecycle policies.

I successfully tested both options.

I now hope this also works with the new S3 Glacier Deep Archive storage class once it becomes available in 2019 which should be priced at $0.00099/GB-mo (less than one-tenth of one cent, or $1.01 per TB-mo).

ncw commented 5 years ago

@WilliamCocker great news :-)

Do you fancy sending a pull request to add GLACIER as an option here? You can put "Fixes #923" in it :-)

https://github.com/ncw/rclone/blob/8fb707e16dab63609bf29b0eac65cf6395af8942/backend/s3/s3.go#L528-L546

AlexandreGohier commented 5 years ago

@ncw Okay, I submitted a pull request as instructed and another one for the documentation but I must warn I'm no GitHub expert so let me know if I should have done something differently.

Thanks for your hard work, rclone rocks!

ncw commented 5 years ago

@ncw Okay, I submitted a pull request as instructed and another one for the documentation but I must warn I'm no GitHub expert so let me know if I should have done something differently.

I don't see your patches - did you click the button in your fork to create the pull request?

ncw commented 5 years ago

Ah I see, you've created the pull requests on your fork... Can you create them on the rclone main repository?

AlexandreGohier commented 5 years ago

@ncw ok I just did, let me know if there is anything else

ncw commented 5 years ago

@ncw ok I just did, let me know if there is anything else

Perfect - thank you :-)