Closed wvmarle closed 11 years ago
And now it suddenly works... calling boto calls directly (bypassing glaciercorecalls completely).
The reason it doesn't work is that one of my changes got reverted (probably when the recent merges were done). I looked at the master branch, and in glaciercorecalls.py, GlacierVault.make_request is no longer passing on the params
argument. To add the change back:
I think calling boto directly is the right way to go :) On Oct 8, 2012 6:20 AM, "Gabriel Burca" notifications@github.com wrote:
The reason it doesn't work is that one of my changes got reverted (probably when the recent merges were done). I looked at the master branch, and in glaciercorecalls.py, GlacierVault.make_request is no longer passing on the params argument. To add the change back:
- return self.connection.make_request(method, uri, headers, data)
return self.connection.make_request(method, uri, headers, data, params=params)
— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/69#issuecomment-9216124.
I'm not disagreeing. I'm just pointing out why getting the list of parts, and other commands that depend on pagination markers, are currently broken - until the transition to boto is complete, or until the 1-liner fix I indicated above is re-introduced.
I'm currently at like 90% boto direct. Only upload and download are handled partially internal now, rest is all boto calls.
Trying to find a way to cut down on memory usage by reading directly from file, not copying chunks to memory. Download also needs work to do this part by part, instead of all in one go like it's now.
You could cut on memory usage by mmaping parts of file in memory. This way you don't have to change core calls. On Oct 8, 2012 4:53 PM, "wvmarle" notifications@github.com wrote:
I'm currently at like 90% boto direct. Only upload and download are handled partially internal now, rest is all boto calls.
Trying to find a way to cut down on memory usage by reading directly from file, not copying chunks to memory. Download also needs work to do this part by part, instead of all in one go like it's now.
— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/69#issuecomment-9228299.
Good one. Will look into that. Would be great to be able to handle big chunks and not using much memory for it. Hope performance is still good.
The only problem is stdio. In that case mmaping would not work, but in that case i don't see better solution as to read whole part in memory.
Disk I/O speed should generally not be a problem for reading whole file twice(once for hashing and once for upload), but i would still have an option to allow both methods, but changes with using mmap should not be significat anyway. On Oct 8, 2012 8:19 PM, "wvmarle" notifications@github.com wrote:
Good one. Will look into that. Would be great to be able to handle big chunks and not using much memory for it. Hope performance is still good.
— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/69#issuecomment-9234998.
I see stdin as not so important a method (I have no idea why someone would want to use that; large amounts of data I'd normally write to local file before uploading). It's nice to have it but well other than spooling the data to disk and re-reading it there is no way to prevent buffering of complete blocks, after all we must take tree hash before uploading. So if memory is a constraint, user will just have to dump their stream to a local disk first, and then upload it to Glacier. Or use smaller block sizes and not send out too much data.
I think stdin is great method, because you can encrypt and compress on the fly, but i agree in that case you should have enough memory.
You must consider people have lack of disk space, rather than memory and still willing to upload big encrypted and compressed archives.
We would also need support to resume this kind of uploads. On Oct 9, 2012 9:33 AM, "wvmarle" notifications@github.com wrote:
I see stdin as not so important a method (I have no idea why someone would want to use that; large amounts of data I'd normally write to local file before uploading). It's nice to have it but well other than spooling the data to disk and re-reading it there is no way to prevent buffering of complete blocks, after all we must take tree hash before uploading. So if memory is a constraint, user will just have to dump their stream to a local disk first, and then upload it to Glacier. Or use smaller block sizes and not send out too much data.
— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/69#issuecomment-9251426.
Oh yes, good one. Forgot. Bacula encrypts and compresses my archives already so it's not an issue for me.
Support for resumption from stdin is there already (see my latest pull request); it's the exact same code as what handles file resumption. It's the same task after all. It'll just read the data block by block regardless of where it comes from, take the tree hash, and compare it to the hash provided by Glacier.
It seems though (need to investigate more - not sure if I'm correct here) that it breaks in the following situation:
I do not sort blocks; I take a page of 50 blocks and check those, then take the next page of 50 blocks, and check, etc. For files you can just provide the byte range, for stdin it must be consecutive.
So for stdin you would have to first read all pages of blocks, then sort them by byte range, and start checking. This may be a rather lengthy process if you have to get like 20 pages of hashes, which is a waste of time if it then fails, so I didn't do it. And for file it's irrelevant. It's an issue that should be investigated, and fixed for stdin jobs.
I also really gotta add support for Bacula's multi-file list...
/path/to/backup/vol001|vol002|vol003|...
As you can imagine my automated upload of backups is broken now :-)
I have a problem fetching the parts list of an interrupted multipart upload. Whatever I try, I can only manage to get the first 50 chunks. I'm stuck at that point.
The
marker
parameter (supposed to give next page) nor thelimit
parameters seem to do anything. I tried to limit to less than 50, get 50. Tried to limit to 100, get 50. This is a multipart that got timed out some 880 parts in (at 10%).I even hacked myself together a branch that's using boto's calls, and got the exact same results.
Any ideas?
Sorry no code as my branch is too messed up at the moment :-)