Closed GoogleCodeExporter closed 9 years ago
More information:
/var/log/messages shows:
Dec 10 18:30:32 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 18:31:41 rt s3fs: init $Rev: 274 $
Dec 10 18:35:33 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 18:40:34 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 18:45:35 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 18:50:36 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 18:55:37 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 19:00:38 rt init: Id "x" respawning too fast: disabled for 5 minutes
Dec 10 19:04:34 rt s3fs: ###curlCode: 52 msg: server returned nothing (no
headers, no data)
Dec 10 19:05:40 rt init: Id "x" respawning too fast: disabled for 5 minutes
using curl curl-7.15.5-9.el5.i386
Original comment by jve...@gmail.com
on 11 Dec 2010 at 12:33
SVN revision addresses the curlCode 52 error -- please retrieve the most
recent version and try again.
Original comment by dmoore4...@gmail.com
on 11 Dec 2010 at 12:46
[deleted comment]
Latest revision R279 now freezes after many file copies.
Also, each file copy is generating the following error in /var/log/messages
even though the file saves fine to our bucket
Dec 10 20:12:20 rt s3fs: ### CURLE_HTTP_RETURNED_ERROR
Dec 10 20:12:20 rt s3fs: ###response=404
Dec 10 20:12:24 rt s3fs: ### CURLE_HTTP_RETURNED_ERROR
Dec 10 20:12:24 rt s3fs: ###response=404
BTW: using curl.i386 version 7.15.5
Original comment by jve...@gmail.com
on 11 Dec 2010 at 1:27
The 404 error is somewhat concerning, I've never seen it reported before.
Unfortunately, it can mean several things according to AWS documentation:
http://docs.amazonwebservices.com/AmazonS3/latest/API/
I'll have to put in a hook to capture the message that is reported along with
the code so that we can get a better idea of what is going on.
Original comment by dmoore4...@gmail.com
on 11 Dec 2010 at 1:51
What was I thinking, the 404 is a normal response for when an object is not
found. Think of it this way, you're writing a file to a fresh clean bucket and
you want to first check to see if it all ready exists. You query for it and it
returns "Not Found".
These showing up in the log is bad, although it IS a HTTP error, it isn't
necessarily a s3fs error.
This boils down to the use of setting the curl option FAILONERROR true. What
really needs to happen is to turn the option off and parse the HTTP return code
and act accordingly
CURLE_HTTP_RETURNED_ERROR is only returned with FAILONERROR is set. If
CURLE_OK is returned, then the http return needs to be examined.
What does your rsync command look like? ...try --verbose --progress
--whole-file as options
Original comment by dmoore4...@gmail.com
on 11 Dec 2010 at 4:07
svn rev 280 gets rid of the HTTP 404 errors from syslog
Original comment by dmoore4...@gmail.com
on 11 Dec 2010 at 4:44
404 error messages have disappeared by s3fs still freezing after lots of
continuous activity wether its using cp, rsync or a custom copying program.
This really makes it unusable. Any way to further debug this?
The frozen s3fs appears as: ps aux | grep -i s3 shows:
root 15419 0.0 0.2 51228 5024 ? Ssl 04:05 0:00 s3fs -d
myrepo876 /mnt/repo
Original comment by jve...@gmail.com
on 11 Dec 2010 at 9:20
I'm seeing issues too when using rsync. See issue #130
I think that I have correlated the issue with the transfer of large files ( >
500 MB ), as an rsync of thousands of smaller files works.
Another user has wrapped his transfer routine in a script to automatically
restart the transfer.
I have done this manually and rsync picks up where it left off, however since
s3fs is (currently) file based (as opposed to block or partial file based), the
whole file needs to be transferred again.
Less than perfect, but it appears to be a workaround. Admittedly, I'm at a loss
as to why this is happening. I've put in more debugging messages and so far
there isn't a smoking gun.
Original comment by dmoore4...@gmail.com
on 12 Dec 2010 at 5:06
jvence, the number of files probably isn't the issue, but its probably the
size of the individual files. When rsync seems to hang, what is the size of
the last (or current) file that is being rsync'ed?
Also what is your internet connection's upload speed. If the file is large
(>500 and your upload speed is slow (or typical ~1Mb/s)) then the seemingly
hang may just be how long it takes.
With my experiments (a large file 2GB and upload speed of 1.5Mb/s), then if I
am lucky, the rsync will work. If I'm unlucky, then some sort of connection
timeout occurs.
The answer here may be to implement AWS's new multipart upload. This allows for
chopping up large files into smaller pieces and uploading them individually.
Original comment by dmoore4...@gmail.com
on 13 Dec 2010 at 9:49
In my case it has nothing to do with large file size but the number of files
being transferred. All the files are less then 20K except for one that is about
4MB. The internet connect is a full T3 lines between both our servers and
Amazon's AWS. And again this will not only happen with rsync but cp or our own
custom java file copier.
Any way to debug this?
Original comment by jve...@gmail.com
on 14 Dec 2010 at 10:11
OK, sorry, I mis-speculated.
I haven't been able to reproduce this issue (doesn't mean that is doesn't
exist), so it'll be tough for me to debug.
Here are some steps that you can do to try to debug this:
- run s3fs in foreground and debug modes while capturing stdout/stderr into a
log file:
s3fs -d -f <bucket> <mountpoint> -o retries=20 2>&1 | tee /tmp/s3fs.log
- run rsync in extremely verbose mode
rsync -avvv --progress --stats --whole-file <source> <mountpoint> | tee
/tmp/rsync.log
- wait for error to occur, and gather up the messages in syslog
grep s3fs /var/log/syslog > /tmp/s3fs-syslog.log
These log files may provide some clues as to what/where the issue might be.
You can also added more debug statements into the s3fs.cpp code itself and
recompile to try and narrow your search to which function may be the culprit
(or a contributing factor)
Original comment by dmoore4...@gmail.com
on 14 Dec 2010 at 4:15
Ok managed to crash it again while it was processing a file that is 4 KB on
disk (2,304 bytes) in size. So far my freeze rate is probably around 90%. Here
are the last few debug lines:
access[path=/]
access[path=/p]
access[path=/p/projects]
access[path=/p/projects/502]
access[path=/p/projects/502/tiles]
access[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg]
open[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg][flags=32768]
downloading[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg][fd=285]
read[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg]
flush[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg][fd=285]
release[path=/p/projects/502/tiles/50094_561615121_1172865_n.jpg][fd=285]
flush[path=/p/projects/502/tiles/41649_561280596_6157_n.jpg][fd=284]
release[path=/p/projects/502/tiles/41649_561280596_6157_n.jpg][fd=284]
getattr[path=/p]
getattr[path=/p/projects]
getattr[path=/p/projects/502]
getattr[path=/p/projects/502/tiles]
getattr[path=/p/projects/502/tiles/41659_561690764_5017_n.jpg]
getattr[path=/p/projects/502/tiles/.fuse_hidden00000186000000bb]
rename[from=/p/projects/502/tiles/41659_561690764_5017_n.jpg][to=/p/projects/502
/tiles/.fuse_hidden00000186000000bb]
Original comment by jve...@gmail.com
on 14 Dec 2010 at 6:06
How about the last 20 or so s3fs messages in /var/log/syslog?
grep s3fs /var/log/syslog | tail -n 20
Can you run it a few more times and see if the freeze occurs consistently on
the rename function?
Original comment by dmoore4...@gmail.com
on 14 Dec 2010 at 6:20
Ok it froze again:
write[path=/p/502/generated.jpg]
write[path=/p/502/generated.jpg]
flush[path=/p/502/generated.jpg][fd=273]
uploading[path=/p/502/generated.jpg][fd=273][size=2110352]
release[path=/p/502/generated.jpg][fd=273]
previously in the log file [rename] worked fine:
...
getattr[path=/p/502/tiles/.fuse_hidden0000026800000001]
rename[from=/p/502/tiles/HsTZSDw4avx.gif][to=/p/canvases/502/tiles/.fuse_hidden0
000026800000001]
copying[path=/p/502/tiles/.fuse_hidden0000026800000001]
...
Original comment by jve...@gmail.com
on 14 Dec 2010 at 7:00
I'm using s3fs on Cent OS 5.5 and cannot find syslog. /var/messages not showing
anything relevant
Original comment by jve...@gmail.com
on 14 Dec 2010 at 7:01
It appears that the standard installation of CentOS does not log level DEBUG
messages. To change this, you'll have to edit /etc/syslog.conf and change this
line:
*.info;mail.none;authpriv.none;cron.none /var/log/messages
to
*.debug;mail.none;authpriv.none;cron.none /var/log/messages
...and restart the syslog daemon
% sudo /etc/init.d/syslog restart
s3fs debug messages will now appear in /var/log/messages when the -d option is
used.
Original comment by dmoore4...@gmail.com
on 15 Dec 2010 at 1:26
Well, this is might be re-assuring. I have duplicated your problem (at least I
think so) on my CentOS 5.5 virtual machine. It hung during the third file of
an rsync.
How the heck were you able to recover your system from this? Killing the
process, (kill -9 PID), ctrl-c, etc... seem to have no effect.
I'm somewhat suspicious of the version of FUSE CentOS uses vs. what's on my
Debian/Ubuntu machines: 2.7.4 vs. 2.8.4
The last s3fs debug message in /var/log/messages is:
Dec 14 18:51:15 centos s3fs: HTTP response code 404
This might give me something to work with.
Original comment by dmoore4...@gmail.com
on 15 Dec 2010 at 1:59
Ok turned on logging will report after the next crash. To recover from a crash,
simply umount -l, kill s3fs process and restart s3fs.
Original comment by jve...@gmail.com
on 15 Dec 2010 at 12:38
[deleted comment]
Latest crash:
access[path=/]
access[path=/p]
access[path=/p/projects]
access[path=/p/projects/505]
access[path=/p/projects/505/tl]
access[path=/p/projects/505/tl/41661_628172_8289_n.jpg]
open[path=/p/projects/505/tl/41661_628172_8289_n.jpg][flags=32768]
downloading[path=/p/projects/505/tl/41661_628172_8289_n.jpg][fd=564]
read[path=/p/projects/505/tl/41661_628172_8289_n.jpg]
getattr[path=/p]
getattr[path=/p/projects]
getattr[path=/p/projects/505]
getattr[path=/p/projects/505/tl]
getattr[path=/p/projects/505/tl/48822_913226_8374372_n.jpg]
getattr[path=/p/projects/505/tl/.fuse_hidden000000560000002c]
rename[from=/p/projects/505/tl/48822_913226_8374372_n.jpg][to=/p/projects/505/tl
/.fuse_hidden000000560000002c]
/var/log/messages:
Dec 15 08:09:54 web2 s3fs: HTTP response code 200
Dec 15 08:09:54 web2 s3fs: URL is
http://s3.amazonaws.com/myrepo123/p/projects/505/tl/48822_913226_8374372_n.jpg
Dec 15 08:09:54 web2 s3fs: URL changed is
http://myrepo123.s3.amazonaws.com/p/projects/505/tl/48822_913226_8374372_n.jpg
Dec 15 08:09:54 web2 s3fs: connecting to URL
http://myrepo123.s3.amazonaws.com/p/projects/505/tl/48822_913226_8374372_n.jpg
Dec 15 08:09:54 web2 s3fs: HTTP response code 200
Dec 15 08:09:54 web2 s3fs: URL is
http://s3.amazonaws.com/myrepo123/p/projects/505/tl/.fuse_hidden000000560000002c
Dec 15 08:09:54 web2 s3fs: URL changed is
http://myrepo123.s3.amazonaws.com/p/projects/505/tl/.fuse_hidden000000560000002c
Dec 15 08:09:54 web2 s3fs: connecting to URL
http://myrepo123.s3.amazonaws.com/p/projects/505/tl/.fuse_hidden000000560000002c
Dec 15 08:09:54 web2 s3fs: HTTP response code 404
Original comment by jve...@gmail.com
on 15 Dec 2010 at 1:32
There's a wrapper around the curl library function curl_easy_perform. That is
where the HTTP response code debug message is coming from. This tells me that
the call to libcurl's function returned "successfully" (i.e. it's not where the
hang is).
Upon getting the http response, then our wrapper function
"my_curl_easy_perform" returns a code back to the calling function. Several
functions call my_curl_easy_perform. So I'll put in some more debug code to
trace the call and return.
Original comment by dmoore4...@gmail.com
on 15 Dec 2010 at 8:37
This is looking more and more like a bug in FUSE itself. This bug has been
previously identified in other FUSE-based file systems and since fixed.
I still can't recover my system reliably from the hang, even with the hint.
Many times I need a restart.
I added more debug into my working copy and the last hang occurred upon a
system call to stat() This appears to correlate to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=493565
https://bugs.launchpad.net/ubuntu/+source/fuse/+bug/634554
Fixed in Ubuntu 10.10 and Debian (squeeze, sid) by installing FUSE 2.8.4 but
not in Fedora or CentOS -- that also correlates to my observation that I could
not reproduce this on my main systems.
One way to gain confidence that this is the FUSE bug would be to compile and
install 2.8.4 on a redhat-based system (I don't think that there is a .rpm
available yet, besides, I'm not yum/rpm savvy)
Original comment by dmoore4...@gmail.com
on 16 Dec 2010 at 4:45
I've upgraded Fuse to 2.8.4 and it seems to be working fine now. No crashes so
far and I did some extensive testing already. Will report back if it freezes
again but the problem seemed to disappear with Fuse 2.8.4.
Original comment by jve...@gmail.com
on 16 Dec 2010 at 9:23
Did you find a RPM or did you have to download the tarball and compile/install
yourself. I plan on changing the minimum version of fuse to 2.8.4, but would
like to provide some guidance in the Installation Notes on what redhat-based
distro's need to do to get FUSE 2.8.4 installed. Thanks.
I'll give this a try myself on my CentOS and Fedora virtual machines
Original comment by dmoore4...@gmail.com
on 16 Dec 2010 at 11:26
I couldn't get fuse 2.8.4 installed on my CentOS 5.5 virtual machine to test
and I couldn't find a rpm for it. I'm going to assume that jvence's
confirmation of resolution is good enough along with I wasn't able to duplicate
this on my Fedora 14 (another redhat os) with fuse 2.8.5 installed.
Calling this one fixed and moving the minimum fuse verison needed to 2.8.4
Original comment by dmoore4...@gmail.com
on 17 Dec 2010 at 4:26
Closed with r281
Original comment by dmoore4...@gmail.com
on 17 Dec 2010 at 4:41
[deleted comment]
Upgrading to Fuse:
wget
http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.4/fuse-2.8.4.tar.gz
tar zxf fuse-2.8.4.tar.gz
cd fuse-2.8.4
./configure
make
make install
echo /usr/local/lib >> /etc/ld.so.conf
ldconfig
Original comment by jve...@gmail.com
on 17 Dec 2010 at 6:40
Just a follow up, installed fuse 2.8.4 on my CentOS virtual machine, and all
looks to be be good:
% rsync .....
Number of files: 1323
Number of files transferred: 1309
Total file size: 1665096709 bytes
Total transferred file size: 1665096709 bytes
Literal data: 1665096709 bytes
Matched data: 0 bytes
File list size: 23521
File list generation time: 0.096 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1665381152
Total bytes received: 28902
sent 1665381152 bytes received 28902 bytes 108034.77 bytes/sec
total size is 1665096709 speedup is 1.00
[mooredan@centos ~]$ cat /etc/issue
CentOS release 5.5 (Final)
Kernel \r on an \m
[mooredan@centos ~]$ fusermount --version
fusermount version: 2.8.4
[mooredan@centos ~]$ s3fs --version
Amazon Simple Storage Service File System 1.26
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Original comment by dmoore4...@gmail.com
on 19 Dec 2010 at 6:53
I am using Fuse 2.8.4, still seen the freeze. I am using ftp server to upload
on s3fs mount point. My observation is, when large files are uploaded this
issue is reproducible.
Original comment by ravindra...@gmail.com
on 15 Jan 2011 at 5:40
Have the same thing here. It seems to freeze - should I install the SVN version
or is the latest release supposed to fix this?
Original comment by marc.tru...@gmail.com
on 20 Jan 2011 at 6:18
Sorry, using Fuse 2.8.5 - is this a problem?
Original comment by marc.tru...@gmail.com
on 20 Jan 2011 at 6:19
The latest tarball is supposed to have fix this issue -- it did fix the
original issue that was reported and duplicated. This seems to be a new issue,
however the symptoms may appear to be similar. The FTP thing is a new wrinkle.
I don't see any reason why Fuse 2.8.5 would be a problem, but I haven't
personally used it (yet).
I recommend opening a new issue concerning your problem with details, details,
details, VERY SPECIFIC DETAILS. If we cannot reproduce the issue there's little
we can do to try and fix it.
Original comment by dmoore4...@gmail.com
on 20 Jan 2011 at 4:32
I just found that using FUSE 2.8.5 will not work for me using s3fs 1.57. Going
back to FUSE 2.8.4 fixed the problem instantly. The problem seems to occur
when/after handing over control to fuse with the call to fuse_main(...) in
s3fs.cpp. As far as I could tell using debug output statements in the code
everything else works fine up to this point. With FUSE 2.8.5 s3fs terminated
instantly after calling fuse_main. Unfortunately I can't provide an easily
reproducable test scenario since this was on a custom built system loosely
based on Debian with lots of manual updates from wheezy/sid.
Original comment by i...@readynasxtras.com
on 10 Jul 2011 at 11:37
how do you unmount?
Original comment by mike54...@gmail.com
on 26 Nov 2011 at 4:05
Original issue reported on code.google.com by
jve...@gmail.com
on 10 Dec 2010 at 10:39