Handle CURLE_COULDNT_CONNECT error better

GoogleCodeExporter commented 9 years ago

Reported by user stephentenberg:

... it dies a couple times a day on average with "###curlCode: 7 msg: couldn't 
connect to server". Unfortunately that does cause s3fs to go bye-bye and my 
program has to umount and mount again.

I think I see above thats a curl bug, but even so, it would be nice if it 
didn't dismount the file system. :-)

Original issue reported on code.google.com by dmoore4...@gmail.com on 7 Dec 2010 at 1:31

GoogleCodeExporter commented 9 years ago

Thanks for reporting this.  I put the "curl error trap" in on purpose to so 
that we can gracefully service errors coming back from curl. 

It appears that we shouldn't error out under this condition, just not yet (or 
at all?)  From what I have been able to research so far, this comes about when 
there is a network connectivity issue.

We probably should treat this curl error the same way as how 
CURLE_OPERATION_TIMEOUT is handled.

But before, that can I we a little bit more information please:

  - what other s3fs messages can be found in syslog prior to this message?

  - what is your /etc/fstab entry or the command line options used?

Original comment by dmoore4...@gmail.com on 7 Dec 2010 at 1:55

GoogleCodeExporter commented 9 years ago

Hi I am pushing millions of small files to S3 with s3fs( a few bytes to a few 
megs average).

Every 50,000 or so files (I don't think there is any pattern although it does 
seem that s3fs grows on every operation) it crashes.  I am sure there is no 
actual network connection issue at the time of these events as its on a running 
production server with concurrent network activity and no other erros.

My log shows every file uploaded and I don't see any other related messages.  
My script sleeps 45 seconds when this happens and issues a umount and a new 
mount and resumes.  After that it may go 6 hours before it burbs again.

Here is my mount command:

s3fs XXXXXXXXXX -o allow_other \
        -o retries=20 -o connect_timeout=20 \
        -o readwrite_timeout=60 /s3/XXXXXXXXX

From my point of view, it should not crash the file system, just return an 
error for the operation requested so we can retry it a few seconds later.

Thanks

Steve

Original comment by stephent...@gmail.com on 7 Dec 2010 at 2:18

GoogleCodeExporter commented 9 years ago

The network connection thing (if there is one) might be outside and a "hiccup" 
for a fleeting moment -- but enough for curl to produce a error.

Treating the couldn't connect the same as the timeout seems to make sense. 
Additionally, giving a few seconds before trying again as well.

You also alluded to a potential memory leak.  This may or may not be related to 
an existing issue that Adrian took on.  There's a utility named "valgrind" 
(admittedly, I don't know much as this utility) that may help detect the leak.

Original comment by dmoore4...@gmail.com on 7 Dec 2010 at 3:05

GoogleCodeExporter commented 9 years ago

This issue was closed by revision r277.

Original comment by dmoore4...@gmail.com on 8 Dec 2010 at 4:52

Changed state: Fixed

rayantony / s3fs

Handle CURLE_COULDNT_CONNECT error better #132