Open SolarisYan opened 4 years ago
could you provide more info about what's lost and what the workload is like?
I use debug flag as -f --debug_s3 --debug_fuse, and i find that after the StatFS, there have error. Follow are some log:
2020/01/17 08:06:00.320833 fuse.DEBUG Op 0x00036c10 connection.go:478] -> OK () 2020/01/17 08:06:09.620513 fuse.DEBUG Op 0x00036c12 connection.go:395] <- StatFS 2020/01/17 08:06:09.621140 fuse.DEBUG Op 0x00036c12 connection.go:478] -> OK () 2020/01/17 08:06:13.026719 fuse.DEBUG Op 0x00036c14 connection.go:395] <- GetXattr (inode 1, name "security.selinux", name security.selinux) 2020/01/17 08:06:13.026992 fuse.DEBUG GetXattr 1 [security.selinux] 2020/01/17 08:06:13.027075 fuse.DEBUG Op 0x00036c14 connection.go:480] -> Error: "no data available" 2020/01/17 08:06:13.027246 fuse.DEBUG Op 0x00036c16 connection.go:395] <- GetXattr (inode 1, name "system.posix_acl_access", name system.posix_acl_access) 2020/01/17 08:06:13.027453 fuse.DEBUG GetXattr 1 [system.posix_acl_access] 2020/01/17 08:06:13.027534 fuse.DEBUG Op 0x00036c16 connection.go:480] -> Error: "no data available" 2020/01/17 08:06:13.027766 fuse.DEBUG Op 0x00036c18 connection.go:395] <- GetXattr (inode 1, name "system.posix_acl_default", name system.posix_acl_default) 2020/01/17 08:06:13.028083 fuse.DEBUG GetXattr 1 [system.posix_acl_default] 2020/01/17 08:06:13.028146 fuse.DEBUG Op 0x00036c18 connection.go:480] -> Error: "no data available" 2020/01/17 08:06:13.030768 fuse.DEBUG Op 0x00036c1a connection.go:395] <- OpenDir (inode 1) 2020/01/17 08:06:13.031062 fuse.DEBUG OpenDir 1 [] 2020/01/17 08:06:13.031119 fuse.DEBUG Op 0x00036c1a connection.go:478] -> OK () 2020/01/17 08:06:13.031256 fuse.DEBUG Op 0x00036c1c connection.go:395] <- ReadDir (inode 1) 2020/01/17 08:06:13.031418 fuse.DEBUG ReadDir 1 [0] 2020/01/17 08:06:13.032100 s3.DEBUG DEBUG: Request s3/ListObjects Details: ---[ REQUEST POST-SIGN ]----------------------------- GET /default?delimiter=%2F&prefix= HTTP/1.1 Host: storage-service.avlcloud:9000 User-Agent: aws-sdk-go/1.17.13 (go1.13.4; linux; amd64) Accept-Encoding: identity Authorization: AWS4-HMAC-SHA256 Credential=avljiesong/20200117/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=304830997ad2360c9ed7bad0a3eae0d86c713578ae1984f64cc825777765996f X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20200117T080613Z
2020/01/17 08:06:13.034098 s3.DEBUG DEBUG: Response s3/ListObjects Details: ---[ RESPONSE ]-------------------------------------- HTTP/1.1 200 OK Content-Length: 434 Accept-Ranges: bytes Content-Security-Policy: block-all-mixed-content Content-Type: application/xml Date: Fri, 17 Jan 2020 08:06:13 GMT Server: MinIO/DEVELOPMENT.2019-10-21T08-29-12Z Vary: Origin X-Amz-Bucket-Region: us-east-1 X-Amz-Request-Id: 15EA9E072EB7F7E2 X-Xss-Protection: 1; mode=block
2020/01/17 08:06:13.034208 s3.DEBUG &{[derived_dir/ meddata/ public/] []
could you be more explicit? What operations did you do? What did you expect to see? What did you actually see?
When my tensorflow program running, it will read and write many files in the mount directory. there are two case:
If i use Ctrl + C to interrupt program, then to check the mount dir, sometimes the files and directory will all lost, sometimes lose a part of them.
If the program is not interrupted, sometimes the files will be lost, all or part.
My expect is files and directories in mount directory not lost.
@kahing thank you.
Could you provide some sample code? At databricks we have many customers using tensorflow + goofys and it works fine. One caveat is that you cannot use hdf5 for model saving/checkpointing. For an example see https://docs.databricks.com/applications/deep-learning/distributed-training/mnist-tensorflow-keras.html
Thank you, i'll have a look
Is there any new progress about this issue?
@kahing i mount to the minio buckect to the /data directory, sometimes ago, i can't ls it, but i can ls with the minio client mc.
ls -l /data/ ls: reading directory '/avl_data/': Permission denied
with the debug_s3, it outputs:
2020/07/21 05:41:52.452985 fuse.DEBUG Op 0x001e6af0 connection.go:408] <- GetXattr (inode 1, name "security.selinux", name security.selinux) 2020/07/21 05:41:52.453219 fuse.DEBUG GetXattr 1 [security.selinux] 2020/07/21 05:41:52.453316 fuse.DEBUG Op 0x001e6af0 connection.go:493] -> Error: "no data available" 2020/07/21 05:41:52.453581 fuse.DEBUG Op 0x001e6af2 connection.go:408] <- GetXattr (inode 1, name "system.posix_acl_access", name system.posix_acl_access) 2020/07/21 05:41:52.453804 fuse.DEBUG GetXattr 1 [system.posix_acl_access] 2020/07/21 05:41:52.453886 fuse.DEBUG Op 0x001e6af2 connection.go:493] -> Error: "no data available" 2020/07/21 05:41:52.454136 fuse.DEBUG Op 0x001e6af4 connection.go:408] <- GetXattr (inode 1, name "system.posix_acl_default", name system.posix_acl_default) 2020/07/21 05:41:52.454320 fuse.DEBUG GetXattr 1 [system.posix_acl_default] 2020/07/21 05:41:52.454407 fuse.DEBUG Op 0x001e6af4 connection.go:493] -> Error: "no data available" 2020/07/21 05:41:52.455663 fuse.DEBUG Op 0x001e6af6 connection.go:408] <- OpenDir (inode 1) 2020/07/21 05:41:52.455778 fuse.DEBUG OpenDir 1 [] 2020/07/21 05:41:52.455864 fuse.DEBUG Op 0x001e6af6 connection.go:491] -> OK () 2020/07/21 05:41:52.456109 fuse.DEBUG Op 0x001e6af8 connection.go:408] <- ReadDir (inode 1) 2020/07/21 05:41:52.456266 fuse.DEBUG ReadDir 1 [0] 2020/07/21 05:41:52.475704 fuse.DEBUG Op 0x001e6af8 connection.go:493] -> Error: "permission denied" 2020/07/21 05:41:52.475785 fuse.ERROR *fuseops.ReadDirOp error: permission denied 2020/07/21 05:41:52.476426 fuse.DEBUG Op 0x001e6afa connection.go:408] <- ReleaseDirHandle 2020/07/21 05:41:52.476556 fuse.DEBUG ReleaseDirHandle 2020/07/21 05:41:52.476603 fuse.DEBUG Op 0x001e6afa connection.go:491] -> OK ()
Hi @kahing
am doing some testing to use goofys to store backups using Proxmox Backup Server.
At the moment I could find similar error than @SolarisYan and @lazywhite
At the moment I had been testing to perform ls -l
and touch test.txt
and they are returning the no data available
This are my logs when using touch test2.txt
Feb 23 18:55:03 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b0 connection.go:408] <- LookUpInode (parent 1, name "test2.txt")
Feb 23 18:55:03 pbs /usr/bin/goofys[7339]: fuse.DEBUG Inode.LookUp 1 [test2.txt]
Feb 23 18:55:03 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Request s3/ListObjects Details:#012---[ REQUEST POST-SIGN ]-----------------------------#012GET /pbs-test?delimiter=%2F&max-keys=1&prefix=test2.txt%2F HTTP/1.1#015#012Host: storage.bhs.cloud.ovh.net#015#012User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)#015#012Accept-Encoding: identity#015#012Authorization: AWS4-HMAC-SHA256 Credential=9fd61916c53e47a0902bc9b5dfd9df33/20210223/BHS/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=8ee3eed9e5f989258b29dfd9cc3b636d7be9e1b68ef27f06af25af57d1a8cbe5#015#012X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855#015#012X-Amz-Date: 20210223T175503Z#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:03 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Request s3/HeadObject Details:#012---[ REQUEST POST-SIGN ]-----------------------------#012HEAD /pbs-test/test2.txt HTTP/1.1#015#012Host: storage.bhs.cloud.ovh.net#015#012User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)#015#012Authorization: AWS4-HMAC-SHA256 Credential=9fd61916c53e47a0902bc9b5dfd9df33/20210223/BHS/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=5b6bd8805b644a6a97d23bd6b6f47ebf462adaa4b82c55960c8b43c12a760c7b#015#012X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855#015#012X-Amz-Date: 20210223T175503Z#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:03 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Request s3/HeadObject Details:#012---[ REQUEST POST-SIGN ]-----------------------------#012HEAD /pbs-test/test2.txt/ HTTP/1.1#015#012Host: storage.bhs.cloud.ovh.net#015#012User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)#015#012Authorization: AWS4-HMAC-SHA256 Credential=9fd61916c53e47a0902bc9b5dfd9df33/20210223/BHS/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=1e4a9480596eb909d469067e8fac86890a9fa5ae1832871e01d6027ba83340f3#015#012X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855#015#012X-Amz-Date: 20210223T175503Z#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Response s3/ListObjects Details:#012---[ RESPONSE ]--------------------------------------#012HTTP/1.1 200 OK#015#012Content-Length: 257#015#012Content-Type: application/xml#015#012Date: Tue, 23 Feb 2021 17:55:04 GMT#015#012X-Amz-Id-2: txeb6bde0173a449c49db0f-0060354177#015#012X-Amz-Request-Id: txeb6bde0173a449c49db0f-0060354177#015#012X-Iplb-Instance: 28795#015#012X-Iplb-Request-Id: C632B521:59DC_8E2CE366:01BB_60354177_3194F79:16B81#015#012X-Openstack-Request-Id: txeb6bde0173a449c49db0f-0060354177#015#012X-Trans-Id: txeb6bde0173a449c49db0f-0060354177#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG &{[] [] <nil> false }
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Response s3/HeadObject Details:#012---[ RESPONSE ]--------------------------------------#012HTTP/1.1 404 Not Found#015#012Content-Type: application/xml#015#012Date: Tue, 23 Feb 2021 17:55:04 GMT#015#012X-Amz-Id-2: tx9821be4f092349ecb0efc-0060354177#015#012X-Amz-Request-Id: tx9821be4f092349ecb0efc-0060354177#015#012X-Iplb-Instance: 28795#015#012X-Iplb-Request-Id: C632B521:3D55_8E2CE366:01BB_60354177_2EEDCEE:16B72#015#012X-Openstack-Request-Id: tx9821be4f092349ecb0efc-0060354177#015#012X-Trans-Id: tx9821be4f092349ecb0efc-0060354177#015#012Content-Length: 0#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Validate Response s3/HeadObject failed, not retrying, error NotFound: Not Found#012#011status code: 404, request id: tx9821be4f092349ecb0efc-0060354177, host id: tx9821be4f092349ecb0efc-0060354177
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG HEAD test2.txt = no such file or directory
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Response s3/HeadObject Details:#012---[ RESPONSE ]--------------------------------------#012HTTP/1.1 404 Not Found#015#012Content-Type: application/xml#015#012Date: Tue, 23 Feb 2021 17:55:04 GMT#015#012X-Amz-Id-2: tx45b2285565e34c3794e3d-0060354177#015#012X-Amz-Request-Id: tx45b2285565e34c3794e3d-0060354177#015#012X-Iplb-Instance: 28796#015#012X-Iplb-Request-Id: C632B521:D142_8E2CE366:01BB_60354177_3086E7C:1B75F#015#012X-Openstack-Request-Id: tx45b2285565e34c3794e3d-0060354177#015#012X-Trans-Id: tx45b2285565e34c3794e3d-0060354177#015#012Content-Length: 0#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Validate Response s3/HeadObject failed, not retrying, error NotFound: Not Found#012#011status code: 404, request id: tx45b2285565e34c3794e3d-0060354177, host id: tx45b2285565e34c3794e3d-0060354177
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG HEAD test2.txt/ = no such file or directory
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG <-- LookUpInode 1 test2.txt no such file or directory
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b0 connection.go:493] -> Error: "no such file or directory"
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b2 connection.go:408] <- CreateFile (parent 1, name "test2.txt")
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Create 1 [test2.txt]
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG <-- CreateFile 7 test2.txt []
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b2 connection.go:491] -> OK (inode 7)
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b4 connection.go:408] <- FlushFile (inode 7)
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG FlushFile 7 test2.txt []
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Request s3/PutObject Details:#012---[ REQUEST POST-SIGN ]-----------------------------#012PUT /pbs-test/test2.txt HTTP/1.1#015#012Host: storage.bhs.cloud.ovh.net#015#012User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)#015#012Content-Length: 0#015#012Authorization: AWS4-HMAC-SHA256 Credential=9fd61916c53e47a0902bc9b5dfd9df33/20210223/BHS/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date;x-amz-storage-class, Signature=e0ed78d596b5adf39e2b65aecdab59a24170171a559f5ad497e1fb8864409a7d#015#012Content-Md5: 1B2M2Y8AsgTpgAmY7PhCfg==#015#012X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855#015#012X-Amz-Date: 20210223T175504Z#015#012X-Amz-Storage-Class: STANDARD#015#012Accept-Encoding: gzip#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: s3.DEBUG DEBUG: Response s3/PutObject Details:#012---[ RESPONSE ]--------------------------------------#012HTTP/1.1 200 OK#015#012Content-Length: 0#015#012Content-Type: text/html; charset=UTF-8#015#012Date: Tue, 23 Feb 2021 17:55:04 GMT#015#012Etag: "d41d8cd98f00b204e9800998ecf8427e"#015#012Last-Modified: Tue, 23 Feb 2021 17:55:05 GMT#015#012X-Amz-Id-2: txb598c158975f480f91bf0-0060354178#015#012X-Amz-Request-Id: txb598c158975f480f91bf0-0060354178#015#012X-Iplb-Instance: 28796#015#012X-Iplb-Request-Id: C632B521:D142_8E2CE366:01BB_60354178_3086E80:1B75F#015#012X-Openstack-Request-Id: txb598c158975f480f91bf0-0060354178#015#012X-Trans-Id: txb598c158975f480f91bf0-0060354178#015#012#015#012#012-----------------------------------------------------
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG <-- FlushFile 7 test2.txt [<nil> 4 7]
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b4 connection.go:491] -> OK ()
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b6 connection.go:408] <- SetInodeAttributes (inode 7, atime 2021-02-23 18:55:04.131072464 +0100 CET, mtime 2021-02-23 18:55:04.131072464 +0100 CET)
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG GetAttributes 7 test2.txt []
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b6 connection.go:491] -> OK ()
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b8 connection.go:408] <- FlushFile (inode 7)
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG FlushFile 7 test2.txt []
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG <-- FlushFile 7 test2.txt [<nil> 4 7]
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000b8 connection.go:491] -> OK ()
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000ba connection.go:408] <- ReleaseFileHandle
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG ReleaseFileHandle test2.txt 4 7
Feb 23 18:55:04 pbs /usr/bin/goofys[7339]: fuse.DEBUG Op 0x000000ba connection.go:491] -> OK ()
I mounted the S3 using this command
goofys --http-timeout=5m --endpoint=https://storage.bhs.cloud.ovh.net --region="BHS" --uid=34 --gid=34 -o allow_other --debug_fuse --debug_s3 pbs-test /datastore01/
Goofys version: goofys version 0.24.0-45b8d78375af1b24604439d2e60c567654bcdf88 S3 Provider: OVH
if you need more info just let me know
thanks in advance
Hi,
just to add more information
When I use touch test2.txt
it creates the file locally, I can see it, but it does not appear in the S3 bucket.
Ok it seems the problem with the command ls -l
happens because -l
request extra attributes or information from the file, that can not be provide by S3. Using ls
without any argument does not show any error.
Hi @kahing i have use goofys to mount minio buckect in docker container with ubuntu 18.04, and it works very well. But about after one week, i fount that that there have lost fome files and directory under mount point, i have check in the minio server, all files and dir are there. Recently, this situation often appears, which has seriously affected my work. I use a program to handle a large number of files under the mount point. Once this happens, my program will exit and cannot be predicted. Has anyone else encountered this problem, what should I do?
I use the commad goofys --region us-east-1 --cheap --stat-cache-ttl 60s --type-cache-ttl 60s --http-timeout 300s --uid 1000 --gid 100 --endpoint https:/xxx default /data