Failure to upload largish files

wvmarle commented 11 years ago

Again I'm having problems uploading a large file. I really don't understand; yesterday it worked, today it fails. Same code, I did not yet pull down the blocksize patch. Yesterday the same worked fine, and that part of the code is untouched. Very strange.

At the same time the boto provided glacier.py script seems to work. At least it started to send data, I didn't wait for the 9 GB to finish.

What I figured out:

first block is pushed in the buffer, not uploaded.
second block is pushed in the buffer, writer.write(part) is called, and somewhere in that call it just kicks out. No message, just returns to the command prompt.
no data is sent to glacier: I do not see any activity using iftop. Uploading a 128MB block I should see, that takes a while.

A smaller file, less than a block size, uploads fine:

file is pushed in the buffer, but the writer.write(part) call doesn't upload yet.
writer.close() is called, it writes out the block correctly.

wvmarle commented 11 years ago

I've narrowed down the problem (using good-old print "here I am" statements), but python not giving an error message is not helpful.

The line where the script stops is glaciercorecalls.py:212: chunks = [str[i*chunk:(i+1)*chunk] for i in range(chunk_count)]

There are 128 chunks (128 MB part) to be hashed; for some reason the script simply stops at this line. No error message or anything, it just stops and returns to the command prompt. And yesterday it seemed to work, now not, so it appears to have to do with my system but how??

offlinehacker commented 11 years ago

Hum that's strange. So it eventially no errors no nothing, just crashes?

If you want to debug you can place import pdb; pdb.set_trace() in line where you want to debug. Python will start pdb(python debugger) when it will see that line and you will be able to walk line by line using command next, you will be able to view code where you currently are using list and print vaules of variables using print varname. To continue execution till next such line you have command cont.

wvmarle commented 11 years ago

It just returns to the command prompt; seemingly a clean exit. No messages.

Update: I get an exit status code of 137. That's interesting. Now to figure out what that means!

Later tonight will try the debugger too.

wvmarle commented 11 years ago

I found the cause: running out of memory. My system normally has about 700 MB of available memory, why that's not enough to hold 128 MB of data is another matter of course but anyway that's what results in the crash.

Doing some investigations, I found that it stops when counter i reaches a value of 65-80. This is not always the same. In top I also see my free memory decrease very fast.

Now first of all I changed the hash function to a single line, avoiding having to put the complete chunk in the array chunks (and renamed variable str to data. Old code:

chunks = [data[i*chunk:(i+1)*chunk] for i in range(chunk_count)]
return [hashlib.sha256(x).digest() for x in chunks]

My code:

return [hashlib.sha256(data[i*chunk:(i+1)*chunk]).digest() for i in range(chunk_count)]

I don't know which version is faster; can't test the original with a 128 MB block, my version at least saves a lot of memory, and it should have the exact same final result. Now it passes the hashing function, but it crashes in boto with a memory error, here is the traceback:

Traceback (most recent call last):
  File "/usr/local/bin/glacier-cmd", line 9, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 726, in main
    args.func(args)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 264, in putarchive
    writer.write(part)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 330, in write
    self.send_part()
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 314, in send_part
    part)
  File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 61, in make_request
    sender, override_num_retries)
  File "/usr/local/lib/python2.6/dist-packages/boto/connection.py", line 910, in make_request
    return self._mexe(http_request, sender, override_num_retries)
  File "/usr/local/lib/python2.6/dist-packages/boto/connection.py", line 789, in _mexe
    boto.log.debug('Data: %s' % request.body)
MemoryError

Decreasing the DEFAULT_BLOCK_SIZE to 8 MB solves this problem and it uploads again.

So there is a memory error somewhere, somehow, and I suspect Python (I use version 2.6.6 on the system the script runs on) itself is to blame here. I have 700 MB free memory, which should be enough to store a single block five times, yet it doesn't work like that. Strange.

offlinehacker commented 11 years ago

We have to greatly optimize this ASOAP!

On Wed, Sep 26, 2012 at 4:51 PM, wvmarle notifications@github.com wrote:

I found the cause: running out of memory. My system normally has about 700 MB of available memory, why that's not enough to hold 128 MB of data is another matter of course but anyway that's what results in the crash.

Doing some investigations, I found that it stops when counter i reaches a value of 65-80. This is not always the same. In top I also see my free memory decrease very fast.

Now first of all I changed the hash function to a single line, avoiding having to put the complete chunk in the array chunks (and renamed variable str to data. Old code:

chunks = [data[i_chunk:(i+1)_chunk] for i in range(chunk_count)] return [hashlib.sha256(x).digest() for x in chunks]

My code:

return [hashlib.sha256(data[i_chunk:(i+1)_chunk]).digest() for i in range(chunk_count)]

I don't know which version is faster; can't test the original with a 128 MB block, my version at least saves a lot of memory, and it should have the exact same final result. Now it passes the hashing function, but it crashes in boto with a memory error, here is the traceback:

Traceback (most recent call last): File "/usr/local/bin/glacier-cmd", line 9, in load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')() File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 726, in main args.func(args) File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glacier.py", line 264, in putarchive writer.write(part) File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 330, in write self.send_part() File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 314, in send_part part) File "/usr/local/lib/python2.6/dist-packages/glacier-0.2dev-py2.6.egg/glacier/glaciercorecalls.py", line 61, in make_request sender, override_num_retries) File "/usr/local/lib/python2.6/dist-packages/boto/connection.py", line 910, in make_request return self._mexe(http_request, sender, override_num_retries) File "/usr/local/lib/python2.6/dist-packages/boto/connection.py", line 789, in _mexe boto.log.debug('Data: %s' % request.body) MemoryError

Decreasing the DEFAULT_BLOCK_SIZE to 8 MB solves this problem and it uploads again.

So there is a memory error somewhere, somehow, and I suspect Python (I use version 2.6.6 on the system the script runs on) itself is to blame here. I have 700 MB free memory, which should be enough to store a single block five times, yet it doesn't work like that. Strange.

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/47#issuecomment-8892486.

wvmarle commented 11 years ago

Resolved with latest merge. Closing.

uskudnik / amazon-glacier-cmd-interface

Failure to upload largish files #47