greghendershott / aws

Racket support for Amazon Web Services.
BSD 2-Clause "Simplified" License
78 stars 25 forks source link

multipart-put/file raises contract violation with a big file #46

Closed krrrcks closed 9 years ago

krrrcks commented 9 years ago

I tried to upload a big file (about 88 GB) to eu-central-1 (Frankfurt) S3 data center. After a while I faced the following contract violation:

upload-part: contract violation
  expected: (and/c exact-integer? (between/c 1 10000))
  given: 10001
  which isn't: (between/c 1 10000)
  in: the 3rd argument of
      (->
       string?
       string?
       (and/c exact-integer? (between/c 1 10000))
       bytes?
       (cons/c
        (and/c exact-integer? (between/c 1 10000))
        string?))
  contract from: (function upload-part)
  blaming: <pkgs>/aws/aws/s3.rkt
   (assuming the contract is correct)
  at: <pkgs>/aws/aws/s3.rkt:647.26
  context...:
   /home/ubuntu/racket/collects/racket/contract/private/blame.rkt:143:0: raise-blame-error16
   /home/ubuntu/.racket/6.2.1/pkgs/aws/aws/pool.rkt:33:4: loop

The violation repeated with 10002, 10003, 10004 and then I canceled the upload.

greghendershott commented 9 years ago

The reason for the (between/c 1 10000) contract is that Amazon's docs say that part numbers must be from 1 to 10,000 inclusive:

http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html

With an upload this big, I guess the size of each part must be bigger, so as to keep the number of parts <= 10,000?

You can use the optional keyword argument #:part-size to specify something bigger than the default (which is currently set to the minimum size, 5 MB).


Although this is all fine for multipart-put, it seems that multipart-put/file could be more helpful. After all, it knows the total size up-front (the size of the file). So instead of defaulting #:part-size to 5 MB it could be as small as possible, but big enough that no more than 10,000 parts are required.

krrrcks commented 9 years ago

Ah, I see. Well, after re-reading your documentation on multipart-put I could have guessed to find a proper value for #:part-size. Thanks for the hint and explanation.

greghendershott commented 9 years ago

I have an update that calculates a suitable part size automatically -- such that multipart-put/file would have worked for you. I'm mulling it over a bit more before I commit and push.

krrrcks commented 9 years ago

Wow, thanks! That sounds great.

greghendershott commented 9 years ago

The other thing I'm considering is how to make a failure more user-friendly, in terms of either resuming or cleaning up. Although it would no longer be interrupted due to the specific too-many-parts problem, any large upload could be interrupted for a variety of reasons, e.g. connectivity. Aside from avoiding a plain upload's size limit, another advantage of a multipart put is that you can deal with interruptions.

The library does already give you the building blocks to do this: initiate-multipart-upload, upload-part, and either complete-multipart-upload or abort-multipart-upload. But the convenience functions multipart-put and multipart-put/file don't attempt to help with this. I'd like to think about whether they should try to, and how.

krrrcks commented 9 years ago

Yes, that would be a nice thing. Another aspect that could be added for multipart-put/file would be to take care of the parts and perhaps abort the multipart upload (I just figured out that the thread opening contract violation left an abandoned multipart upload.)