Add new --partsize option to upload subcommand

gburca commented 12 years ago

Allow the user to specify the part_size to be used with GlacierWriter. The old default value of 128Mb limits the archive size that can be uploaded to about 1.3Tb.
If the user doesn't specify a part_size, compute the optimal part_size and use that instead of the old default value.

wvmarle commented 12 years ago

Looks good. Will also improve the progress indicator, more updates. Maybe have it print a warning if user provided block size is too small.

But how about performance of the upload? Doesn't it take time to make new connection for uploading each block?

gburca commented 12 years ago

Maybe have it print a warning if user provided block size is too small.

In that case, the size is automatically adjusted.

would be cool to also correct reading size

I'm not sure we need to do that. What do you think that buys us?

If the user provided size is smaller than READ_PART_SIZE, adjusting the read size downward doesn't buy you much. Reading the default 128Mb block from HD is a pretty fast operation these days, and sooner or later you'll have to read the whole block anyways.
If the user provided size is larger than READ_PART_SIZE, should we just blindly adjust upward? The max size the user can use is 2^32_1024_1024 = 4.5e15. That's a ridiculously large size to attempt to read into memory at once. Since we need to put a limit anyways, why not just leave it at READ_PART_SIZE?

So what do you think? How would you adjust the read size, and why?

offlinehacker commented 12 years ago

Well 128mb of memory storage these days is so small, but still you sometimes have embedded devices(for example i have 128mb-ram NAS running debian) and these could cause you problems. We could use something like http://docs.python.org/library/io.html#io.BytesIO, but i have to check how this gets integrated.

On Wed, Sep 26, 2012 at 4:07 PM, Gabriel Burca notifications@github.comwrote:

Maybe have it print a warning if user provided block size is too small.

In that case, the size is automatically adjusted.

would be cool to also correct reading size

I'm not sure we need to do that. What do you think that buys us?

If the user provided size is smaller than READ_PART_SIZE, adjusting the read size downward doesn't buy you much. Reading the default 128Mb block from HD is a pretty fast operation these days, and sooner or later you'll have to read the whole block anyways.

If the user provided size is larger than READ_PART_SIZE, should we just blindly adjust upward? The max size the user can use is 2^32_1024_1024 = 4.5e15. That's a ridiculously large size to attempt to read into memory at once. Since we need to put a limit anyways, why not just leave it at READ_PART_SIZE?

So what do you think? How would you adjust the read size, and why?

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/pull/46#issuecomment-8890912.

offlinehacker commented 12 years ago

Okay here's a way how we will solve this. The problem that arises is that we have to calculate hash before we are sending a file. So what will we do is have an option where instead of reading part in memory, we mmap this part of a file in memory. After we do that we can normaly process everything, but this will help us, because if file is mmaped we don't need any buffer, we will have to read more, but that should not be a problem if your concern is ram.

On Wed, Sep 26, 2012 at 4:20 PM, Jaka Hudoklin jakahudoklin@gmail.comwrote:

Well 128mb of memory storage these days is so small, but still you sometimes have embedded devices(for example i have 128mb-ram NAS running debian) and these could cause you problems. We could use something like http://docs.python.org/library/io.html#io.BytesIO, but i have to check how this gets integrated.

On Wed, Sep 26, 2012 at 4:07 PM, Gabriel Burca notifications@github.comwrote:

Maybe have it print a warning if user provided block size is too small.

In that case, the size is automatically adjusted.

would be cool to also correct reading size

I'm not sure we need to do that. What do you think that buys us?

If the user provided size is smaller than READ_PART_SIZE, adjusting the read size downward doesn't buy you much. Reading the default 128Mb block from HD is a pretty fast operation these days, and sooner or later you'll have to read the whole block anyways.

If the user provided size is larger than READ_PART_SIZE, should we just blindly adjust upward? The max size the user can use is 2^32_1024_1024 = 4.5e15. That's a ridiculously large size to attempt to read into memory at once. Since we need to put a limit anyways, why not just leave it at READ_PART_SIZE?

So what do you think? How would you adjust the read size, and why?

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/pull/46#issuecomment-8890912.

gburca commented 12 years ago

Improved handling of large archives on memory constrained devices is important, but is beyond the scope of this particular commit. We should probably open a new issue for that. All I'm trying to do here is remove the current limitation of 1.3Tb, and perhaps improve things slightly for smaller archives.

offlinehacker commented 12 years ago

Agree, will do it. On Sep 27, 2012 5:39 AM, "Gabriel Burca" notifications@github.com wrote:

Improved handling of large archives on memory constrained devices is important, but is beyond the scope of this particular commit. We should probably open a new issue for that. All I'm trying to do here is remove the current limitation of 1.3Tb, and perhaps improve things slightly for smaller archives.

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/pull/46#issuecomment-8922484.

uskudnik commented 12 years ago

Very nice :) Merging, support for NAS devices should go into another issue.

uskudnik / amazon-glacier-cmd-interface

Add new --partsize option to upload subcommand #46