liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
207 stars 39 forks source link

Writing large file stuck in retry #39

Closed waiyip-aquabyte closed 3 years ago

waiyip-aquabyte commented 4 years ago
>>> p = S3Path("/foo/bar")

This works

>>> with p.open('w') as fp:
...   json.dump([None]*10,fp)

This get stuck for a long time

>>> with p.open('w') as fp:
...   json.dump([None]*5000,fp)

This works well also

>>> p.write_text(json.dumps([None]*5000))

When it is stuck, I've seen a lot of log messages like this.

DEBUG:urllib3.util.retry:Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.

This looks like an infinite loop. But after a long time, it completed successfully.

liormizr commented 4 years ago

Hi @waiyip-aquabyte

I don't see the issue reproducing in me setup with my aws account

It's sound's like a AWS network issue or a boto3 connection issue.

Are you still seeing it reproducing? Maybe can you see in cloudWatch if s3 network load is high?

About S3Path it's almost the same running:

>>> with p.open('w') as fp:
...   json.dump([None]*5000,fp)

AND

>>> p.write_text(json.dumps([None]*5000))

I don't see way it can behave differently.

liormizr commented 3 years ago

@waiyip-aquabyte are you still seeing the issue?

liormizr commented 3 years ago

@waiyip-aquabyte are you still seeing the issue? Or can we close this issue?

waiyip-aquabyte commented 3 years ago

Sorry, I need find a chance to upgrade and verify. The problem was reproducible.

liormizr commented 3 years ago

We already start to talk about this issue in a deferent issue #55

@impredicative you talked about your use case When you are using S3Path and smart_open.

Maybe we can use smart_open instead of optimising our current code. What do you think about smart_open project?

markopy commented 3 years ago

I think this is the same bug I ran into. I noticed it continuously uploading data and never finishing.

Using smart_open might be a good idea since it is much more widely used and tested. It does magic things though like decompression of certain file types based on extension which would need to be disabled.

impredicative commented 3 years ago

For writing small files, s3path seems fine as is. For a single-step write, boto3 works. For a streaming write smart_open is best.

In all cases I rely on s3path to manipulate paths and generate URIs.

liormizr commented 3 years ago

Hi @waiyip-aquabyte. Sorry for the delay.

From now we are using smart_open as the file object for s3path :-)

added in version 0.3.0 It's a big version, You can see the change log in Release 0.3.0

(Current version 0.3.01)