leeroybrun / glacier-vault-remove

Remove all archives stored inside an Amazon Glacier vault, even if you have a huge number of them.
379 stars 50 forks source link

Use multiple processes to speed up big deletes #15

Closed adityabansod closed 7 years ago

adityabansod commented 8 years ago

I had an archive that has over 1.2M+ objects in it so this PR allows you to run multiple processes to parallelize the deletes to Glacier.

I also moved some of the debug level items to info since the archive is so huge it's hard to see if the tool is doing anything.

luminal-neal commented 8 years ago

Newbie question - the AWS docs say You can delete one archive at a time from a vault. To delete the archive you must provide its archive ID in your delete request. Does this mean one archive per api request, or one delete command to a vault at a time?

leeroybrun commented 8 years ago

@adityabansod Thanks a lot for your work! This is really great. For the log levels, the goal was to speed up the process, as every log printed to the console slow down the execution. If you want to have the log printed, you can use the DEBUG argument. The best way to know if the tool is working would have to print something every X archive, or use a timer, to print something like "Removed XX archives from XX.". This way we do not print something on the console for every archive removed. What do you think about that?

@luminal-neal I think this is one archive per request. @adityabansod Did you tested your modifications? If this works, this confirm that this is one archive per request.

adityabansod commented 8 years ago

@luminal-neal one archive per DELETE request. I used this branch last month to remove that huge archive I mentioned earlier and it worked without issue. Was ~4x faster than the single proc version.

@leeroybrun yep, that would make sense. i'm not actively using this any more so if you did want to merge it and move the DEBUG to INFO or print out the log every X % Y, I won't be able to get it done for quite some time.

luminal-neal commented 8 years ago

I am experiencing a weird issue with DNS resolution on my Mac laptop when running this with lots of concurrent deletes. I can reproduce this with python removeVault.py us-west-1 <vault_name> 32 pretty easily. Basically I can't browse the web while it's running, and the deletes themselves start to fail. htop doesn't seem to show super heavy CPU usage, but the spikes might be occurring too quickly for the htop update speed. pkill python brings everything back to normal.

Not necessarily a bug, but fyi for anyone running the code.

guillermo-menjivar commented 7 years ago

what is the status of this PR? Are we planning to merge it or is there anything pending that is desired before a proper merge happens? I was planning to put a PR against this repo to introduce some addition features - however if we are encouraged to fork and maintain the own I will got that route - just wanted to know

leeroybrun commented 7 years ago

Sorry for the delay, it's merged now. Thanks a lot for your work @adityabansod !

The problem is that I don't have any vault anymore, so it's hard to test the changes and be sure everything is working. I'm relying on the feedback from peoples using it to be sure to not merge something that will break the script.