pfnet / pfio

IO library to access various filesystems with unified API
https://pfio.readthedocs.io/
MIT License
52 stars 20 forks source link

S3 support for large files #328

Closed alexisVallet closed 11 months ago

alexisVallet commented 11 months ago

Current code on master fails to handle large files on AWS S3 in 2 ways:

  1. When performing multi-part uploading, the md5 given to upload_part is hexadecimal, but base64 is expected.
  2. When calling S3.rename with a source object larger than 5GB, it fails due to limitations of copy_object.

While issue 1. is a straightforward fix. Solving issue 2. requires using multi-part copying, which (in my understanding) makes rename no longer atomic for large files. Considering rename is often used for its atomicity, I'm not sure whether this solution is best, and would be happy to get feedback.