juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.06k stars 888 forks source link

juicefs gc fails because of timeout at ListAll objects #4917

Closed copecog closed 2 weeks ago

copecog commented 3 weeks ago

What happened: juicefs gc fails after 30 seconds when it times out on ListAll

What you expected to happen: For it to succeed.

How to reproduce it (as minimally and precisely as possible): execute: juicefs gc 'mysql://user:pass@(fqdn:3306)/bucket'

Anything else we need to know? juicefs version 1.1.2+2024-02-04.8dbd89a on MinIO VERSION 2024-05-10T01:41:38Z and Percona XtraDB Ver 8.0.36-28.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3)

Looking at the MinIO S3 trace the remote connection disconnects at the 30 second mark before it completes the ListAll. I cannot find a way to set the juicefs timeout for this.

SandyXSD commented 3 weeks ago

What's the exact error did you get? JuiceFS doesn't force any time out, but if the object storage returns an error, the gc fails as expected.

copecog commented 3 weeks ago
copec@pleskstorage1:~/build/juicefs$ META_PASSWORD=**** ./juicefs gc mysql://pleskbackup@(pleskstorage1.xmission.com:3306)/jfs_plesk_backup?tls=skip-verify --verbose
2024/06/04 16:26:38.092053 juicefs[8560] <DEBUG>: maxprocs: Leaving GOMAXPROCS=24: CPU quota undefined [maxprocs.go:47]
2024/06/04 16:26:38.096616 juicefs[8560] <DEBUG>: Debug agent listening on 127.0.0.1:6060 [main.go:321]
2024/06/04 16:26:38.096439 juicefs[8560] <INFO>: Meta address: mysql://pleskbackup:****@(pleskstorage1.xmission.com:3306)/jfs_plesk_backup?tls=skip-verify [interface.go:504]
[xorm] [info]  2024/06/04 16:26:38.099644 PING DATABASE mysql
2024/06/04 16:26:38.111544 juicefs[8560] <WARNING>: The latency to database is too high: 12.594193ms [sql.go:311]
2024/06/04 16:26:38.118213 juicefs[8560] <DEBUG>: Creating minio storage at endpoint https://pleskstorage1.xmission.com:9000/pleskbackup?tls=skip-verify [object_storage.go:167]
2024/06/04 16:26:38.121549 juicefs[8560] <INFO>: Data use minio://pleskstorage1.xmission.com:9000/pleskbackup/plesk-backup/ [gc.go:101]
2024/06/04 16:26:54.782544 juicefs[8560] <DEBUG>: Iterating objects from minio://pleskstorage1.xmission.com:9000/pleskbackup/plesk-backup/chunks/ with prefix  start "" [sync.go:81]
2024/06/04 16:26:54.785092 juicefs[8560] <DEBUG>: Listing objects from minio://pleskstorage1.xmission.com:9000/pleskbackup/plesk-backup/chunks/ marker "" [sync.go:110]
2024/06/04 16:28:55.206769 juicefs[8560] <ERROR>: Can't list minio://pleskstorage1.xmission.com:9000/pleskbackup/plesk-backup/chunks/: RequestError: send request failed
caused by: Get "https://pleskstorage1.xmission.com:9000/pleskbackup?encoding-type=url&marker=&max-keys=1000&prefix=plesk-backup%2Fchunks%2F": net/http: timeout awaiting response headers [sync.go:116]
2024/06/04 16:28:55.206936 juicefs[8560] <FATAL>: list all blocks: RequestError: send request failed
caused by: Get "https://pleskstorage1.xmission.com:9000/pleskbackup?encoding-type=url&marker=&max-keys=1000&prefix=plesk-backup%2Fchunks%2F": net/http: timeout awaiting response headers [gc.go:237]

MinIO logs that the client ends the connection at the ~30s mark.

SandyXSD commented 3 weeks ago

This error means you didn't get response header for 30s after fully writing the request, which usually indicates critical server or network issues. So I suggest checking the MinIO and network configurations first. If you really want to increase the timeout, try changing this value: https://github.com/juicedata/juicefs/blob/c8f48b875e0a70d42bf324c4184bfc9519c31fdc/pkg/object/restful.go#L42

copecog commented 3 weeks ago

Thank you much @SandyXSD for taking time to look at this! I appreciate it!

FWIW It looks like this has been a long standing issue that MinIO takes a long time to do ListAll, as I see the same question asked and many github tickets for it.

SandyXSD commented 2 weeks ago

Thanks for the information! We can leave this issue open to see if anyone else is experiencing similar problems.