yangljun / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

timeouts and out of memory error #351

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Detailed description of observed behavior:

s3fs consumes a large amount of memory (>1GB) and syslog shows several curl 
timeouts. At a certain point, the kernel kills the s3fs process for consuming 
too much memory.

What steps will reproduce the problem - please be very specific and
detailed. (if the developers cannot reproduce the issue, then it is
unlikely a fix will be found)?

Leave s3fs running long enough.

===================================================================
The following information is very important in order to help us to help
you.  Omission of the following details may delay your support request or
receive no attention at all.
===================================================================
Version of s3fs being used (s3fs --version): 1.70

Version of fuse being used (pkg-config --modversion fuse): 2.8.6

System information (uname -a):
  Linux hostname 3.2.0-24-virtual #39-Ubuntu SMP Mon May 21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Distro (cat /etc/issue): Ubuntu 12.04 LTS \n \l

s3fs command line used (if applicable):

/etc/fstab entry (if applicable):

s3fs syslog messages (grep s3fs /var/log/syslog):
kern.log.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122154] 
[23438]     0 23438   259985   115155   0       0             0 s3fs
kern.log.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122248] Out 
of memory: Kill process 23438 (s3fs) score 712 or sacrifice child
kern.log.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122262] 
Killed process 23438 (s3fs) total-vm:1039940kB, anon-rss:460620kB, file-rss:0kB
syslog.1:Jun 15 07:31:48 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371281508  
curl_times[curl]: 1371281477l  readwrite_timeout: 30
syslog.1:Jun 15 07:31:48 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371281508  
curl_times[curl]: 1371281477l  readwrite_timeout: 30
syslog.1:Jun 15 09:22:07 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371288127  
curl_times[curl]: 1371288096l  readwrite_timeout: 30
syslog.1:Jun 15 09:22:07 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371288127  
curl_times[curl]: 1371288096l  readwrite_timeout: 30
syslog.1:Jun 15 09:49:40 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371289780  
curl_times[curl]: 1371289749l  readwrite_timeout: 30
syslog.1:Jun 15 09:49:40 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371289780  
curl_times[curl]: 1371289749l  readwrite_timeout: 30
syslog.1:Jun 15 16:33:49 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371314029  
curl_times[curl]: 1371313998l  readwrite_timeout: 30
syslog.1:Jun 15 16:33:49 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371314029  
curl_times[curl]: 1371313998l  readwrite_timeout: 30
syslog.1:Jun 15 16:47:45 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371314865  
curl_times[curl]: 1371314834l  readwrite_timeout: 30
syslog.1:Jun 15 16:47:45 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371314865  
curl_times[curl]: 1371314834l  readwrite_timeout: 30
syslog.1:Jun 15 19:38:54 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371325134  
curl_times[curl]: 1371325103l  readwrite_timeout: 30
syslog.1:Jun 15 19:38:54 domU-12-31-38-04-DE-E0 s3fs: timeout  now: 1371325134  
curl_times[curl]: 1371325103l  readwrite_timeout: 30
syslog.1:Jun 15 22:14:54 domU-12-31-38-04-DE-E0 s3fs: ### 
CURLE_OPERATION_TIMEDOUT
syslog.1:Jun 15 22:14:56 domU-12-31-38-04-DE-E0 s3fs: ###retrying...
syslog.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122154] [23438] 
    0 23438   259985   115155   0       0             0 s3fs
syslog.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122248] Out of 
memory: Kill process 23438 (s3fs) score 712 or sacrifice child
syslog.1:Jun 16 01:15:23 domU-12-31-38-04-DE-E0 kernel: [355929.122262] Killed 
process 23438 (s3fs) total-vm:1039940kB, anon-rss:460620kB, file-rss:0kB

Original issue reported on code.google.com by jlhawn.p...@gmail.com on 17 Jun 2013 at 9:22

GoogleCodeExporter commented 9 years ago
Hi 

Maybe, the time out error is caused by killed s3fs process.
Your problem is memory leaks by s3fs.

I reports #5 of Issue 314 about memory leaks.
http://code.google.com/p/s3fs/issues/detail?id=314#c5 

I want to know whether your problem is caused by libcurl.
(And if you can, please see Issue 343 which possibly is same as your problem)

Please check your libcurl(libnss or libssl(openssl), version), and let me know.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 18 Jun 2013 at 2:44

GoogleCodeExporter commented 9 years ago
Here are the versions we are currently using:

libcurl/7.22.0
OpenSSL/1.0.1
zlib/1.2.3.4
libidn/1.23
librtmp/2.3

Original comment by jlhawn.p...@gmail.com on 4 Jul 2013 at 12:39

GoogleCodeExporter commented 9 years ago
Hi,

It seems libs version does not have a problem.
I updated the codes today, please check to use latest revision.(now r454)
On latest version, I did not get any error when I uploaded over 6GB files to S3.
I hope this revision solves your problem.
Otherwise If you can,  please check with valgrind etc(tools).

Thanks in advance for your assistance.

Original comment by ggta...@gmail.com on 5 Jul 2013 at 6:42

GoogleCodeExporter commented 9 years ago
We've been especially confused about this because it seemed like s3fs wasn't 
even being used at all while this was happening. We had only mounted a bucket 
and never actually used it.

It turns out that a daily cron job (mlocate) would crawl the entire directory 
tree to build a search index. This included stating hundreds of thousands of 
files!

We'll continue testing to look for strange behavior, but you might want to test 
for yourself how s3fs behaves when running the mlocate cron. (it should be 
located at /etc/cron.daily on an ubuntu install)

Original comment by jlhawn.p...@gmail.com on 16 Jul 2013 at 1:05

GoogleCodeExporter commented 9 years ago
You should also advise those who install that they may want to add fuse.s3fs to 
the PRUNEFS list in /etc/updatedb.conf

Original comment by jlhawn.p...@gmail.com on 16 Jul 2013 at 4:50

GoogleCodeExporter commented 9 years ago
As you suggested, I ran the s3fs command with Valgrind. I then started up the 
mlocate cron job. It immediately began to traverse our entire S3 bucket. The 
memory footprint of s3fs startout out at around 120MB. After an hour, this grew 
to over 300MB. I then stopped the process and unmounted the bucket. 
Surprisingly, Valgrind reported Zero leaked memory.

We have several machines that run the same s3fs command. When mlocate was 
running on some of these machines it would take so long that the next daily 
mlocate cron job would start before the other had even finished. On a few of 
these machines the memory footprint of s3fs would grow so large that the kernel 
would kill the s3fs process.

After looking at the s3fs logs (using the -f option) while running mlocate, it 
seem that all that it is doing is listing directories and stat-ing files. The 
majority of the logs are related to the StatCache. Reading from the source 
code, and the wiki, the stat cache should never grow to more than 1000 entries, 
and the memory of this is an estimated 4MB. Is there a way to explain why the 
memory usage appears to grow so far beyond this?

Original comment by jlhawn.p...@gmail.com on 18 Jul 2013 at 6:13

GoogleCodeExporter commented 9 years ago
Hi,

I'm sorry for replying late.

At first, v1.73 is updated, it fixed a bug for retrying request.
Probably, I think that this version gives quite a few change in this problem.

And I updated some codes(as r479) after v1.73 about initializing curl and 
openssl.
Unfortunately I could not find a certain reason and solution.
So the memory leak by s3fs  depends on the environment(libcurl+openssl/libnss, 
os?), I cannot mention that is a certain cause and is fixed.

If you can, please compile it and test for r479.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 27 Aug 2013 at 8:34

GoogleCodeExporter commented 9 years ago
Hi

(we moved s3fs from googlecodes to 
github(https://github.com/s3fs-fuse/s3fs-fuse)).

If you can, please try to do following:

1) libcurl version
if you use libcurl with NSS library, you should check the version of libcurl.
Because libcurl with NSS under 7.21.4 version has memory leaking bug.

2) multireq_max
If you use latest version of s3fs, please try to specify multireq_max option.
It is 20 parallel count as default, but you maybe to set small number.(ex, 
"multireq_max=3").

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 1 Jun 2014 at 3:31