liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
209 stars 41 forks source link

Streaming from file doesn't work as expected #58

Closed dconathan closed 3 years ago

dconathan commented 3 years ago

I've been running into some issues when trying to work with fileobjects from the .open() command. I can only reproduce it with certain files/objects, like a pickled numpy array:

from s3path import S3Path
import pickle
import numpy as np

x = np.random.rand(3, 3)

path = S3Path("...")

with path.open("wb") as f:
    pickle.dump(x, f)

with path.open("rb") as f:
    y = pickle.load(f)

assert (x == y).all()

This fails with EOFError: Ran out of input when trying to load the array.

The script works fine if you replace: y = pickle.load(f) with y = pickle.loads(f.read()), but this isn't always practical if you want to stream a large file that won't fit into memory.

liormizr commented 3 years ago

Hi @dconathan Looks like an interesting issue We are correctly working on refactor of the file object I'll add a unit test with your case and update here when we have a new version

liormizr commented 3 years ago

Hi @dconathan sorry for the delay

Fixed in version 0.3.0 It's a big version, You can see the change log in Release 0.3.0

(Current version 0.3.01)