vavavr00m / boto

Automatically exported from code.google.com/p/boto
1 stars 0 forks source link

boto.s3.Key.close reads entire body #472

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. open a key for reading a very large object
2. close it
3. watch Key.close block for a very long time, and eventually OOM the server as 
the entire object is buffered in memory

We're working on an HTTP server which uses S3 on the back end, streaming large 
files to the client.  If the client disconnects prematurely, it naturally also 
needs to close the S3 key.  Right now, boto tries to complete the read anyway, 
tying up server resources for a very long time, and sometimes OOMing the server 
if the file is particularly large.

This is due to the current implementation of Key.close:

    def close(self):
        if self.resp:
            self.resp.read()
        self.resp = None
        self.mode = None
        self.closed = True

Changing this to something like:

    def close(self):
        if self.resp:
            while True:
                data = self.resp.read(self.BufferSize)
                if not data:
                    break
        self.resp = None
        self.mode = None
        self.closed = True

...will solve the OOM problem, but it won't solve the problem of consuming 
bandwidth and unnecessarily retaining per-request resources while Key.close 
blocks for long periods during request cleanup.

Ideally, I guess it should be something like:

    def close(self):
        if self.resp:
            self.resp.close()
        self.resp = None
        self.mode = None
        self.closed = True

...but this breaks boto, since boto expects to be able to re-use the connection 
object afterwards.  I'm not sure what the best solution is, but this is a 
fairly serious problem for us.

Original issue reported on code.google.com by timothy....@gmail.com on 2 Nov 2010 at 7:01

GoogleCodeExporter commented 9 years ago
I'm not sure, either.  I wanted to acknowledge the issue, though, and let you 
know that I'm trying to come up with a solution.  Any ideas or suggestions are 
appreciated.

Original comment by Mitch.Ga...@gmail.com on 3 Nov 2010 at 4:40

GoogleCodeExporter commented 9 years ago
What happens if we simply do this:

def close(self):
        self.resp = None
        self.mode = None
        self.closed = True
        self.bucket.connection.connection.close()

This will force the HTTP connection to close but a new connection will be 
created automatically when the Bad StatusLine error is received and things will 
proceed correctly from then on.  I think the underlying HTTP connection has to 
be closed if we don't want to read all of the data from the response.

Original comment by Mitch.Ga...@gmail.com on 3 Nov 2010 at 5:19

GoogleCodeExporter commented 9 years ago
Hm, would that require retries to be enabled?  (We deliberately don't use 
retries; failures are handled further up the stack.)

Original comment by timothy....@gmail.com on 3 Nov 2010 at 6:57