guyson / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Low number of parallel reads/writes possible #357

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Detailed description of observed behavior:

While doing a large number of parallel reads or writes of small (1kb) differing 
S3 objects, I've detected bottlenecks in throughput.  The bottleneck appears 
independent of the power of the machine (t1.micro or cc2.8xlarge EC2 machines 
achieve similar bottlenecks).

In particular, regardless of the number of processes doing parallel writes 
(creating small files continually), I find that I cannot pass the following 
numbers:

4.5 file writes/s (creates)
60 file reads/s

Once such a rate is achieved, further parallel readers/writers only serve to 
slow down the other readers/writers.

The bandwidth requirements of these are very low (60kb/s?), so I suspect there 
is something with s3fs slowing things down.

What steps will reproduce the problem - please be very specific and
detailed. (if the developers cannot reproduce the issue, then it is
unlikely a fix will be found)?

Spawn 5 processes that continually create and write small files with random 
names.  Record average time per write.

Try again with 10 processes.  Record average time per write. Note that file 
create/write throughput (number of files created/s) does not increase.

I can provide test case if needed.

(A similar test can be done with reading large numbers of parallel files)

Given S3's massive parallel throughput, I would expect much higher throughput 
from s3fs.

On a related note, looking at the source code and socket behavior with lsof, it 
appears that persistent HTTP connections (i.e. preserved libcurl handles) are 
not used across file reads and writes (only stat), which may be contributing to 
slowness of writing individual files.

===================================================================
The following information is very important in order to help us to help
you.  Omission of the following details may delay your support request or
receive no attention at all.
===================================================================
Version of s3fs being used (s3fs --version):

Version of fuse being used (pkg-config --modversion fuse):

System information (uname -a):

Distro (cat /etc/issue):

s3fs command line used (if applicable):

/etc/fstab entry (if applicable):

s3fs syslog messages (grep s3fs /var/log/syslog):

Original issue reported on code.google.com by usaa...@gmail.com on 24 Jul 2013 at 2:28

GoogleCodeExporter commented 9 years ago
As an update, it appears I can pull the reads a bit higher (approaching speeds 
of boto), so I'm not worried about that.

However, write speed at 4-7 creates/s is far slower than I get with my own 
testing using boto (300 creates/s), so there appears to be an s3fs performance 
here.

Deletions (explicit rm) with s3fs are also rather slow at 25 deletes/s versus 
boto which can pull 1000+/s

Original comment by usaa...@gmail.com on 25 Jul 2013 at 7:18

GoogleCodeExporter commented 9 years ago
Hi,

Before latest s3fs makes a object,  s3fs checks same name object in a directory.
Then I think s3fs makes many object at time, s3fs send many head request to S3.
If s3fs lists same your new object name before making it, s3fs has a cache 
which the file does not exist when you specify enable_noobj_cache option.

If this issue reasons many request to S3, I need to think another performance 
tuning.

Please check with debug option, and let me know about result.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 13 Aug 2013 at 2:25

GoogleCodeExporter commented 9 years ago
Hi,

This issue is left for a long term, and s3fs project moved on 
Github(https://github.com/s3fs-fuse/s3fs-fuse).
So I closed this issue.

If you have a problem yet, please post new issue.

Regards,

Original comment by ggta...@gmail.com on 23 Dec 2013 at 3:12