s3cmd should support chunked transfer encoding

During the test of our S3 solution, we have experienced an error that it is due mishandling to HTTP chunked mode. To reproduce:

Launch a multipart upload with Process 1.
Try to get the key with Process 2/

s3cmd fails miserably:

jozoppi@giorgio-XPS-15-7590:~$ s3cmd --access_key=RSTOR12Q6N8HZXHX34HSCCRAYG --secret_key=GzxQcfWqpKuX7cZWPHuXj0Pr/KqQZg21J3NNSC22K78 --host https://s3.pre-rstor.com get s3://meoxbucket/foireignaffairs download: 's3://meoxbucket/foireignaffairs' -> './foireignaffairs' [1 of 1]

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! An unexpected error has occurred. Please try reproducing the error using the latest s3cmd code from the git master branch found at: https://github.com/s3tools/s3cmd and have a look at the known issues list: https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions If the error persists, please report the following lines (removing any private info as necessary) to: s3tools-bugs@lists.sourceforge.net

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Invoked as: /usr/bin/s3cmd --access_key=RSTOR12Q6N8HZXHX34HSCCRAYG --secret_key=GzxQcfWqpKuX7cZWPHuXj0Pr/KqQZg21J3NNSC22K78 --host https://s3.pre-rstor.com get s3://meoxbucket/foireignaffairs Problem: <class 'KeyError: 'content-length' S3cmd: 2.0.2 python: 3.7.5 (default, Apr 19 2020, 20:18:17) [GCC 9.2.1 20191008] environment LANG=en_US.UTF-8

We are going to provide a PR soon for fixing this during the weekend. PR ready: https://github.com/s3tools/s3cmd/pull/1100

Thank you for your report. But looking at your issue, I think that it might be interesting to discuss it first before attempting a PR, because somethings does not look right in your current situation. So, I think that it might be the server that could have to be fixed.

Also, something important from your original message that is missing now is that it looks like that: you write directly multipart files to their final destination before the complete step.

That is very bad from the server side for multiple reasons: 1) Through the s3 api, objects on the server side are supposed to be immutable/consistent. So an object should not appear "partially", but only when it is complete, or otherwise it means that it is a new "complete" file that replaced the previous one. 2) In multipart uploads, it is possible to upload parts in random orders and in parallel, only the "complete" request with determinate the final order of the parts.

So, your implementation in rstor, you should ensure to not show or replace a file until its multipart upload is completed. If the upload is interrupted, or failed, normally the previous file should not have been modified. But the same would be true for a normal standard upload, a list/get request at the same time should not see the file that is currently uploaded.

The best way to achieve that by most servers is to create the upload with a temporary name, like .tmp.multipart.FILENAME.bin and just at the complete step doing a rename: mv tmp.multipart.FILENAME.bin FILENAME.bin. And having the server not exposing ".tml.multipart." files.

Also, even if you did not provide the complete debug, I think that the crash was due to the server not sending the "content-length" header field for this file. I guess that the purpose is that the get will do will for the different parts in a "streaming" way, but that is not possible as the protocol was not designed like that. In that way, the client will have no way to know if it got all the file or missing parts, and even "blocking" waiting for a new part to be available could fail as if no data is sent for too long, timeout will be triggered.

If you think that I misunderstood, or that I'm wrong, don't hesitate to tell me, I'm available here to continue the discussion on this topic

s3tools / s3cmd

s3cmd should support chunked transfer encoding #1099