xiongxu / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Can't copy files larger than 2GB to S3 with r299 #144

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am testing out the latest release (r299) and it is failing on copying files 
larger than 2GB.

E.g.

-bash-3.2# ll -h

-rw-r--r-- 1 root root 2.1G Jan 10 07:51 2.1GB

-rw-r--r-- 1 root root 2.0G Jan 10 07:40 2GB

-bash-3.2# cp 2.1GB /mnt/s3/

cp: closing `/mnt/s3/2.1GB': Operation not supported

It copies the 2GB file perfectly.

I was hoping to use the new multipart upload feature to be able to upload files 
larger than 5GB (I already use version r191 to upload 4GB+ files) so at the 
moment it looks like the latest release cant handle files larger than 2GB?

tested on Centos 5.2 x86

thanks

Original issue reported on code.google.com by mjbuc...@gmail.com on 11 Jan 2011 at 10:35

GoogleCodeExporter commented 9 years ago
This is known and documented limitation of newer versions of s3fs that use 
multipart uploads.  The fix is probably easy, but some investigation is needed 
for large file support in the C code.  Current data types have a 2^31 (2GB) 
limit for doing the math that is involved.  Admittedly, since I didn't 
personally have the need for this, I didn't take the time to investigate it.

In the meantime, you have a couple of options:

   - use an older version of s3fs (beware, other since fixed bugs may be lurking)
   - split your files into parts < 2GB (not an attractive solution, but it should work)

Since this is open source, others certainly can look into this and submit a 
patch.  Like I said, this might be an easy fix.

Original comment by dmoore4...@gmail.com on 11 Jan 2011 at 7:01

GoogleCodeExporter commented 9 years ago
This is a shame. s3fs has went from supporting the amazon file size limit (5GB) 
to imposing it's own file size limit. Unfortunately I am not a C++ programmer 
or I would take a stab at this.

Splitting files is not an option for me. Do you mean that there are older 
versions of s3fs that support multipart but dont have the 2GB file size bug? Do 
you know which version this would be so I can test this out?

Original comment by mjbuc...@gmail.com on 12 Jan 2011 at 8:52

GoogleCodeExporter commented 9 years ago
mjbuchan, here's a patch to treat all files, regardless of size, the old way 
without using multipart upload for any file what-so-ever.  This comes without 
any support.

===================================================================
--- src/s3fs.cpp        (revision 300)
+++ src/s3fs.cpp        (working copy)
@@ -1981,6 +1981,9 @@
   // If file is > 20MB, then multipart will kick in
   /////////////////////////////////////////////////////////////

+  result = put_local_fd_small_file(path, meta, fd); 
+  return result;
+
   if(st.st_size > 2147483647) { // 2GB - 1
      // close f ?
      return -ENOTSUP;

Original comment by dmoore4...@gmail.com on 13 Jan 2011 at 3:50

GoogleCodeExporter commented 9 years ago
As I suspected, making this change is relatively easy -- bumping the max file 
size limit to 64GB -- need to finish testing before release.

Original comment by dmoore4...@gmail.com on 20 Jan 2011 at 9:22

GoogleCodeExporter commented 9 years ago
Tested on a large EC2 instance (Ubuntu 10.10), works as expected:

$ rsync -av --progress --stats --whole-file 3GB.bin misc.suncup.org/
sending incremental file list
3GB.bin
  3145728000 100%   27.39MB/s    0:01:49 (xfer#1, to-check=0/1)

Number of files: 1
Number of files transferred: 1
Total file size: 3145728000 bytes
Total transferred file size: 3145728000 bytes
Literal data: 3145728000 bytes
Matched data: 0 bytes
File list size: 42
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3146112090
Total bytes received: 31

sent 3146112090 bytes  received 31 bytes  3601731.11 bytes/sec
total size is 3145728000  speedup is 1.00

Original comment by dmoore4...@gmail.com on 21 Jan 2011 at 4:48

GoogleCodeExporter commented 9 years ago
Resolved with 1.35

Original comment by dmoore4...@gmail.com on 21 Jan 2011 at 5:19

GoogleCodeExporter commented 9 years ago
I have been testing this over the past few days. All works as expected. 
Fantastic!

Out of interest, is 64GB another hard limit or could this be increased at some 
point?

Thanks again for the continued work on this project.

Original comment by mjbuc...@gmail.com on 26 Jan 2011 at 8:42

GoogleCodeExporter commented 9 years ago
When a file is greater than 20MB, the multipart upload kicks in. I choose to 
make the multipart upload chunks 10MB.  AWS limits a the number of parts in a 
multipart upload to 10,000.  So theoretically, I would need to change one 
number in the source code to move the file size limit from 64GB (my limit since 
I like nice "round" numbers, this is 2 to the 36th power) to 100GB.

AWS's limit is somewhere in the 1TB range.  In order to get past the 100GB, 
then the chunk size would need to be adjusted (and probably timeout values and 
such).

I can tell you this, if I ever implement this, I will never test it, so unless 
I can get someone to collaborate on this (to do the testing), it probably won't 
get done by me.  I do not like releasing untested code.

If you feel you have the need for this, please open a new enhancement issue for 
tracking.

Original comment by dmoore4...@gmail.com on 29 Jan 2011 at 12:42

GoogleCodeExporter commented 9 years ago
Hi,

I have the last version, but I got the following when I try to copy a file 
larger than 2GB:

cp: writing `/s3bucket/biodata/genome.fa': No space left on device
cp: closing `/s3bucket/biodata/genome.fa': Input/output error

/var/log/message:

May 10 06:06:29 ascidea s3fs: 2587###result=-28
May 10 06:06:29 ascidea s3fs: 2587###result=-28
May 10 06:06:29 ascidea s3fs: 948 ### bytesWritten:0  does not match 
lBufferSize: 10485760

Any idea. I need to have big files there, and I have no idea how to do it.

thanks

Original comment by lorena.p...@gmail.com on 10 May 2012 at 10:21