s3fs-fuse / s3fs-fuse

FUSE-based file system backed by Amazon S3
GNU General Public License v2.0
8.65k stars 1.02k forks source link

The latest version of s3fs reports IO Error when continuous appending to one single file #1450

Open xrefft opened 4 years ago

xrefft commented 4 years ago

The latest version of s3fs reports IO Error when continuous appending to one single file.
we noticed [ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING error in curldbg log. which probably means concurrency problem.

Additional Information

Version of s3fs being used (s3fs --version)

It happend on s3fs V1.87 (commit:8b7dd82) with OpenSSL. Commit #1448 .

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Name        : fuse
Version     : 2.9.2
Release     : 11.el7
Architecture: x86_64
Install Date: Tue 13 Oct 2020 11:52:59 AM CST
Group       : System Environment/Base
Size        : 223297
License     : GPL+
Signature   : RSA/SHA256, Mon 12 Nov 2018 10:25:34 PM CST, Key ID 24c6a8a7f4a80eb5
Source RPM  : fuse-2.9.2-11.el7.src.rpm
Build Date  : Wed 31 Oct 2018 05:32:35 AM CST
Build Host  : x86-01.bsys.centos.org
Relocations : (not relocatable)
Packager    : CentOS BuildSystem <http://bugs.centos.org>
Vendor      : CentOS
URL         : https://github.com/libfuse/libfuse
Summary     : File System in Userspace (FUSE) utilities
Description :
With FUSE it is possible to implement a fully functional filesystem in a
userspace program. This package contains the FUSE userspace tools to
mount a FUSE filesystem.

Kernel information (uname -r)

3.10.0-957.27.2.el7.x86_64

GNU/Linux Distribution (cat /etc/os-release)

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

s3fs command line used

/s3fs test-s3fs $MOUNT_POINT \
-o passwd_file=$PASSWD_FILE \
-o url=$URL \
-o endpoint=REGION \
-o use_cache=/root/s3fs/s3fs_cache \
-o check_cache_dir_exist \
-o ensure_diskfree=1024 \
-f -o dbglevel=debug \
-o curldbg \
-o connect_timeout=12800 \
-o readwrite_timeout=12800 \
-o stat_cache_expire=0 \
-o max_stat_cache_size=0 \
-o multipart_size=8 \
-o multireq_max=512 \
-o singlepart_copy_limit=128 \
-o use_path_request_style 

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

623132-[CURL DBG] > PUT /test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977 HTTP/1.1
623133-[CURL DBG] > User-Agent: s3fs/1.87 (commit hash 8b7dd82; OpenSSL)
623134-[CURL DBG] > Authorization: AWS4-HMAC-SHA256 Credential=GbIfBzYShzHPSuJmh6wZGjvQf7y9sea0OtgB3CYs/20201014/cn-south-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=7b5e1b865530d35840b7e7a01a162e7f5e906b3d84467c1742bddc2f5b82224d
623135-[CURL DBG] > host: my.endpoint.com
623136-[CURL DBG] > x-amz-content-sha256: b89146fe55222ce6f7d7bc9be1a37b3b8ed6bad8efd9cc4178264deaf2a621e9
623137-[CURL DBG] > x-amz-date: 20201014T102622Z
623138-[CURL DBG] > Content-Length: 262144
623139-[CURL DBG] > Expect: 100-continue
623140-[CURL DBG] >
623141-[CURL DBG] < HTTP/1.1 100 Continue
623142-[INF] s3fs.cpp:s3fs_release(2374): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7]
623143-[INF]       cache.cpp:DelStat(590): delete stat cache entry[path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623144-[INF]       fdcache.cpp:GetFdEntity(2898): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7]
623145-[DBG] fdcache.cpp:Dup(1410): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7][refcnt=3]
623146-[DBG] fdcache.cpp:Close(1363): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7][refcnt=2]
623147-[DBG] fdcache.cpp:Close(3077): [ent->file=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][ent->fd=7]
623148-[DBG] fdcache.cpp:Close(1363): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7][refcnt=1]
623149-[INF]       fdcache.cpp:GetFdEntity(2898): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7]
623150-[DBG] fdcache.cpp:Dup(1410): [path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e][fd=7][refcnt=2]
623151-[WAN] s3fs.cpp:s3fs_release(2407): file(/2d6433f2-0e07-11eb-a665-fa162e2fc03e),fd(7) is still opened.
623152-[CURL DBG] * Empty reply from server
623153-[CURL DBG] * Connection #10 to host my.endpoint.com left intact
623154:[ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING
623155-[INF] curl.cpp:RequestPerform(2687): ### retrying...
623156-[INF]       curl.cpp:RemakeHandle(2313): Retry request. [type=9][url=http://my.endpoint.com/test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977][path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623157-[CURL DBG] * Found bundle for host my.endpoint.com: 0x7ff14c00ca20
623158-[CURL DBG] * Connection 10 seems to be dead!
623159-[CURL DBG] * Closing connection 10
623160-[CURL DBG] * About to connect() to my.endpoint.com port 80 (#11)
623161-[CURL DBG] *   Trying 127.122.123.112...
623162-[CURL DBG] * Connected to my.endpoint.com (127.122.123.112) port 80 (#11)
623163-[CURL DBG] > PUT /test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977 HTTP/1.1
623164-[CURL DBG] > User-Agent: s3fs/1.87 (commit hash 8b7dd82; OpenSSL)
623165-[CURL DBG] > Host: my.endpoint.com
623166-[CURL DBG] > Accept: */*
623167-[CURL DBG] > Content-Length: 262144
623168-[CURL DBG] > Expect: 100-continue
623169-[CURL DBG] >
623170-[CURL DBG] < HTTP/1.1 100 Continue
623171-[CURL DBG] * Empty reply from server
623172-[CURL DBG] * Connection #11 to host my.endpoint.com left intact
623173:[ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING
623174-[INF] curl.cpp:RequestPerform(2687): ### retrying...
623175-[INF]       curl.cpp:RemakeHandle(2313): Retry request. [type=9][url=http://my.endpoint.com/test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977][path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623176-[CURL DBG] * Found bundle for host my.endpoint.com: 0x7ff150018f90
623177-[CURL DBG] * Connection 11 seems to be dead!
623178-[CURL DBG] * Closing connection 11
623179-[CURL DBG] * About to connect() to my.endpoint.com port 80 (#12)
623180-[CURL DBG] *   Trying 127.122.123.112...
623181-[CURL DBG] * Connected to my.endpoint.com (127.122.123.112) port 80 (#12)
623182-[CURL DBG] > PUT /test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977 HTTP/1.1
623183-[CURL DBG] > User-Agent: s3fs/1.87 (commit hash 8b7dd82; OpenSSL)
623184-[CURL DBG] > Host: my.endpoint.com
623185-[CURL DBG] > Accept: */*
623186-[CURL DBG] > Content-Length: 262144
623187-[CURL DBG] > Expect: 100-continue
623188-[CURL DBG] >
623189-[CURL DBG] < HTTP/1.1 100 Continue
623190-[CURL DBG] * Empty reply from server
623191-[CURL DBG] * Connection #12 to host my.endpoint.com left intact
623192:[ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING
623193-[INF] curl.cpp:RequestPerform(2687): ### retrying...
623194-[INF]       curl.cpp:RemakeHandle(2313): Retry request. [type=9][url=http://my.endpoint.com/test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977][path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623195-[CURL DBG] * Found bundle for host my.endpoint.com: 0x7ff15001a4f0
623196-[CURL DBG] * Connection 12 seems to be dead!
623197-[CURL DBG] * Closing connection 12
623198-[CURL DBG] * About to connect() to my.endpoint.com port 80 (#13)
623199-[CURL DBG] *   Trying 127.122.123.112...
623200-[CURL DBG] * Connected to my.endpoint.com (127.122.123.112) port 80 (#13)
623201-[CURL DBG] > PUT /test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977 HTTP/1.1
623202-[CURL DBG] > User-Agent: s3fs/1.87 (commit hash 8b7dd82; OpenSSL)
623203-[CURL DBG] > Host: my.endpoint.com
623204-[CURL DBG] > Accept: */*
623205-[CURL DBG] > Content-Length: 262144
623206-[CURL DBG] > Expect: 100-continue
623207-[CURL DBG] >
623208-[CURL DBG] < HTTP/1.1 100 Continue
623209-[CURL DBG] * Empty reply from server
623210-[CURL DBG] * Connection #13 to host my.endpoint.com left intact
623211:[ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING
623212-[INF] curl.cpp:RequestPerform(2687): ### retrying...
623213-[INF]       curl.cpp:RemakeHandle(2313): Retry request. [type=9][url=http://my.endpoint.com/test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977][path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623214-[CURL DBG] * Found bundle for host my.endpoint.com: 0x7ff150019300
623215-[CURL DBG] * Connection 13 seems to be dead!
623216-[CURL DBG] * Closing connection 13
623217-[CURL DBG] * About to connect() to my.endpoint.com port 80 (#14)
623218-[CURL DBG] *   Trying 127.122.123.112...
623219-[CURL DBG] * Connected to my.endpoint.com (127.122.123.112) port 80 (#14)
623220-[CURL DBG] > PUT /test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977 HTTP/1.1
623221-[CURL DBG] > User-Agent: s3fs/1.87 (commit hash 8b7dd82; OpenSSL)
623222-[CURL DBG] > Host: my.endpoint.com
623223-[CURL DBG] > Accept: */*
623224-[CURL DBG] > Content-Length: 262144
623225-[CURL DBG] > Expect: 100-continue
623226-[CURL DBG] >
623227-[CURL DBG] < HTTP/1.1 100 Continue
623228-[CURL DBG] * Empty reply from server
623229-[CURL DBG] * Connection #14 to host my.endpoint.com left intact
623230:[ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING
623231-[INF] curl.cpp:RequestPerform(2687): ### retrying...
623232-[INF]       curl.cpp:RemakeHandle(2313): Retry request. [type=9][url=http://my.endpoint.com/test-s3fs/2d6433f2-0e07-11eb-a665-fa162e2fc03e?partNumber=907&uploadId=5f86d167661f1f15a9ba5977][path=/2d6433f2-0e07-11eb-a665-fa162e2fc03e]
623233:[ERR] curl.cpp:RequestPerform(2705): ### giving up
623234:[ERR] curl.cpp:MultipartUploadRequest(4224): failed uploading part(-5)
623235-[DBG] curl.cpp:ReturnHandler(334): Return handler to pool
623236:[ERR] fdcache.cpp:RowFlush(2381): failed to multipart post(start=7600476160, size=262144) for file(7).

Details about issue

The latest version of s3fs reports IO Error when continuous appending to one single file.
we noticed [ERR] curl.cpp:RequestPerform(2596): ### CURLE_GOT_NOTHING error in curldbg log. which probably means concurrency problem. PS:When using AWS SDK, OSS will not report this error, but using s3fs will cause this error.

Python test code we use is:

import time, os, uuid
buf = 'abcdefgh'*(1<<20)
total = 0
time_wr = 0
file = './mnt/'+str(uuid.uuid1())
stone = 2<<30
print 'file is', file
with open(file, 'w') as f:
    while True:
        time_start=time.time()
        ret = f.write(buf)
        time_end=time.time()
        time_wr += time_end-time_start
        if ret is not None:
            print 'error', ret
            exit(-1)
        total += len(buf)
        print 'total wr: ', total, 'time sec: ',time_wr
dhananjays commented 3 years ago

We believe we're facing the same issue once we have started using v1.89. Did you figure out a solution/root-cause/workaround for this @xrefft ?

Mount commands we have tried:

1. s3fs -o ecs -o retries=20 -o max_dirty_data=100 -o ensure_diskfree=200 <BUCKET> <MOUNT_POINT>
2. s3fs -o ecs -o retries=20 -o max_dirty_data=50 -o ensure_diskfree=200 <BUCKET> <MOUNT_POINT>

You should note that while we have not specified any disk cache in either command above, we create the /tmp as a tmpfs before running the above commands like this (so as to avoid disk activity on ec2 ebs volumes):

mount -t tmpfs -o size=400m tmpfs /tmp