xiongxu / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Implement multi-part uploads for large files #142

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
For files with size greater than 20MB, split the file into 10MB chunks and 
upload serially (for now).

Use AWS multi-part upload protocol.

Hopefully this will alleviate some of the issues when trying to upload large 
files.

Original issue reported on code.google.com by dmoore4...@gmail.com on 24 Dec 2010 at 10:34

GoogleCodeExporter commented 9 years ago
Issue 97 has been merged into this issue.

Original comment by dmoore4...@gmail.com on 27 Dec 2010 at 8:58

GoogleCodeExporter commented 9 years ago
Issue 30 has been merged into this issue.

Original comment by dmoore4...@gmail.com on 27 Dec 2010 at 11:55

GoogleCodeExporter commented 9 years ago
Just committed r297 as a checkpoint.

Multipart upload is written and operational and has undergone various testing. 
Last big test was a rsync of a 1GB file. Using the standard U.S. region bucket, 
this had issues at the end of the rsync, all parts got uploaded but the final 
mtime/chmod that rsync does caused a hang.

Repeated on a US-west bucket and things went well:

> rsync -av --progress --stats --whole-file --inplace 1G.bin uswest.suncup.org/
sending incremental file list
1G.bin
  1073741824 100%   18.95MB/s    0:00:54 (xfer#1, to-check=0/1)

Number of files: 1
Number of files transferred: 1
Total file size: 1073741824 bytes
Total transferred file size: 1073741824 bytes
Literal data: 1073741824 bytes
Matched data: 0 bytes
File list size: 45
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1073872989
Total bytes received: 31

sent 1073872989 bytes  received 31 bytes  183677.93 bytes/sec
total size is 1073741824  speedup is 1.00

However copies and rsync's of smaller files <500MB worked just fine.

More testing is needed and there are a few issues to take care of before 
calling this one good. (e.g. code cleanup, some more error checking, a compile 
warning, etc)

I did do a MD5 comparison of a 400MB file that I uploaded and then downloaded 
elsewhere -- the sums matched.

Changing the read_write_timeout option helps too for large files. It seems that 
when the multipart upload  is complete, the Amazon server needs some time to 
assemble the file. Increasing the timeout resolved the curl timeout funciton 
issue.

Max file size  is now ~2GB, as getting over 2^31 causes some datatype issues - 
there are some alternate functions to try.  Right now if you try to upload a 
file bigger thean this you'll get a "not supported" error.

If anyone is interested in testing this, please svn update, compile, install 
and test. your feedback will be much appreciated.

Original comment by dmoore4...@gmail.com on 28 Dec 2010 at 4:32

GoogleCodeExporter commented 9 years ago
r298 fixes this one

Original comment by dmoore4...@gmail.com on 30 Dec 2010 at 3:56