Open tcassaert opened 1 year ago
@tcassaert It seems we can switch between api versions for listing objects:
Important note: you should mount geesefs with --list-type 2 or --list-type 1 options if you use it with non-Yandex S3.
Can we try both options, to provide more complete info? (Or maybe it's resolved by using either one of them.)
Maybe also worth mentioning that this is probably a Ceph based system.
I also note that the marker misses a '/', if I add it to the failing request, it does work, but I still fail to see why 'prefix' is not used when trying to list directory contents.
curl 'data.cloudferro.com/DIAS?marker=Sentinel-1/&prefix='
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>DIAS</Name>
<Prefix></Prefix>
<MaxKeys>1000</MaxKeys>
<Marker>Sentinel-1/</Marker>
<NextMarker></NextMarker>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>Sentinel-1/</Key>
<LastModified>2023-01-25T15:57:47.223Z</LastModified>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
<Type>Normal</Type>
</Contents>
<Contents>
<Key>Sentinel-1-COG/</Key>
<LastModified>2023-01-25T15:57:47.223Z</LastModified>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
<Type>Normal</Type>
</Contents>
Hi guys.
+1, you can choose between list-type 1 and 2.
If it happens so that your S3 doesn't support data.cloudferro.com/DIAS?marker=Sentinel-1&prefix=
then it's buggy :)
geesefs doesn't load directories one by one - it preloads multiple directories by utilizing flat listings of S3, to be faster.
I already explained it here: https://github.com/yandex-cloud/geesefs/issues/56#issuecomment-1303367445
If your S3 is Ceph based then it should work - I checked geesefs with Ceph a lot of times. But I've never tested openstack swift.
Thanks for the response!
I've tried with both listing options:
./geesefs -f --debug_s3 --endpoint http://data.cloudferro.com --list-type 1 DIAS /opt/eodata
2023/01/27 11:21:07.569501 s3.DEBUG HEAD http://data.cloudferro.com/DIAS = 200 []
2023/01/27 11:21:07.569596 s3.DEBUG Last-Modified = [Fri, 27 Jan 2023 00:00:01 GMT]
2023/01/27 11:21:07.569605 s3.DEBUG Date = [Fri, 27 Jan 2023 00:00:01 GMT]
2023/01/27 11:21:07.569611 s3.DEBUG Server = [nginx/1.21.3]
2023/01/27 11:21:07.569618 s3.DEBUG Content-Type = [binary/octet-stream]
2023/01/27 11:21:07.569623 s3.DEBUG Content-Length = [0]
2023/01/27 11:21:07.569627 s3.DEBUG Connection = [keep-alive]
2023/01/27 11:21:07.569631 s3.DEBUG Accept-Ranges = [bytes]
2023/01/27 11:21:07.569635 s3.DEBUG Etag = ["d41d8cd98f00b204e9800998ecf8427e"]
2023/01/27 11:21:07.569640 s3.INFO anonymous bucket detected
2023/01/27 11:21:07.569965 s3.DEBUG DEBUG: Request s3/HeadObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
HEAD /DIAS/gvv0ql9cqxgvsznix8mz9gojhlyv3q58 HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
-----------------------------------------------------
2023/01/27 11:21:07.578331 s3.DEBUG DEBUG: Response s3/HeadObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:21:07 GMT
Server: nginx/1.21.3
Content-Length: 0
-----------------------------------------------------
2023/01/27 11:21:07.578375 s3.DEBUG DEBUG: Validate Response s3/HeadObject failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:21:07.578597 s3.DEBUG DEBUG: Request s3/ListMultipartUploads Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?uploads= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:21:07.582407 s3.DEBUG DEBUG: Response s3/ListMultipartUploads Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 39
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:21:07 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:21:07.582451 s3.DEBUG DEBUG: Validate Response s3/ListMultipartUploads failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:21:07.583040 main.INFO File system has been successfully mounted.
Then doing an ls /opt/eodata/Sentinel-1
gives
2023/01/27 11:23:43.346717 s3.DEBUG DEBUG: Request s3/ListObjects Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?marker=Sentinel-1&prefix= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:23:43.457584 s3.DEBUG DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:23:43 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:23:43.457636 s3.DEBUG DEBUG: Validate Response s3/ListObjects failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:23:43.457675 s3.ERROR ListObjects &{0xc0002ca010 <nil> <nil> 0xc0002ca000 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:23:43.458079 s3.DEBUG DEBUG: Request s3/ListObjects Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?marker=Sentinel-1&prefix= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:23:43.553826 s3.DEBUG DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:23:43 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:23:43.553859 s3.DEBUG DEBUG: Validate Response s3/ListObjects failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:23:43.553879 s3.ERROR ListObjects &{0xc0002ca380 <nil> <nil> 0xc0002ca370 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
in the logs and ls: cannot access '/opt/eodata/Sentinel-1': No such file or directory
in the terminal.
First entering the /opt/eodata
and doing an ls
does give the directory contents and shows
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?marker=Sentinel-1&prefix= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:23:43.553826 s3.DEBUG DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:23:43 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:23:43.553859 s3.DEBUG DEBUG: Validate Response s3/ListObjects failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:23:43.553879 s3.ERROR ListObjects &{0xc0002ca380 <nil> <nil> 0xc0002ca370 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:25:21.632024 s3.DEBUG DEBUG: Request s3/ListObjects Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?prefix= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:25:21.747837 s3.DEBUG DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 4150
Connection: keep-alive
Content-Type: application/xml
Date: Fri, 27 Jan 2023 10:25:21 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:25:21.747929 s3.DEBUG &{[] [: 0 C3S/: 0 CAMS/: 0 CEMS/: 0 CLMS/: 0 CMEMS/: 0 Envisat/: 0 Envisat-ASAR/: 0 Jason-3/: 0 Landsat-5/: 0 Landsat-7/: 0 Landsat-8/: 0 SMOS/: 0 Sentinel-1/: 0 Sentinel-1-COG/: 0 Sentinel-1-RTC/: 0 Sentinel-2/: 0 Sentinel-3/: 0 Sentinel-5P/: 0 Sentinel-6/: 0 auxdata/: 0] 0xc00058e748 false }
in the logs.
However, doing an ls /opt/eodata/Sentinel-1
doesn't return anything and shows nothing in the logs.
./geesefs -f --debug_s3 --endpoint http://data.cloudferro.com --list-type 2 DIAS /opt/eodata
2023/01/27 11:27:48.808693 s3.DEBUG HEAD http://data.cloudferro.com/DIAS = 200 []
2023/01/27 11:27:48.808803 s3.DEBUG Content-Type = [binary/octet-stream]
2023/01/27 11:27:48.808835 s3.DEBUG Content-Length = [0]
2023/01/27 11:27:48.808844 s3.DEBUG Connection = [keep-alive]
2023/01/27 11:27:48.808856 s3.DEBUG Accept-Ranges = [bytes]
2023/01/27 11:27:48.808875 s3.DEBUG Etag = ["d41d8cd98f00b204e9800998ecf8427e"]
2023/01/27 11:27:48.808884 s3.DEBUG Last-Modified = [Fri, 27 Jan 2023 00:00:01 GMT]
2023/01/27 11:27:48.808895 s3.DEBUG Date = [Fri, 27 Jan 2023 00:00:01 GMT]
2023/01/27 11:27:48.808902 s3.DEBUG Server = [nginx/1.21.3]
2023/01/27 11:27:48.808910 s3.INFO anonymous bucket detected
2023/01/27 11:27:48.809312 s3.DEBUG DEBUG: Request s3/HeadObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
HEAD /DIAS/h6v7fkm80omj8lsgyepgvpvkua511giq HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
-----------------------------------------------------
2023/01/27 11:27:48.823983 s3.DEBUG DEBUG: Response s3/HeadObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:27:48 GMT
Server: nginx/1.21.3
Content-Length: 0
-----------------------------------------------------
2023/01/27 11:27:48.824025 s3.DEBUG DEBUG: Validate Response s3/HeadObject failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:27:48.824210 s3.DEBUG DEBUG: Request s3/ListMultipartUploads Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?uploads= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:27:48.826980 s3.DEBUG DEBUG: Response s3/ListMultipartUploads Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 39
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:27:48 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:27:48.827012 s3.DEBUG DEBUG: Validate Response s3/ListMultipartUploads failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:27:48.828061 main.INFO File system has been successfully mounted.
Then doing an ls /opt/eodata/Sentinel-1
gives
2023/01/27 11:28:30.609293 s3.DEBUG DEBUG: Request s3/ListObjectsV2 Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?list-type=2&prefix=&start-after=Sentinel-1 HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:28:30.711958 s3.DEBUG DEBUG: Response s3/ListObjectsV2 Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:28:30 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:28:30.712014 s3.DEBUG DEBUG: Validate Response s3/ListObjectsV2 failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:28:30.712062 s3.ERROR ListObjects &{0xc00006a9a0 <nil> <nil> 0xc00006a990 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:28:30.712715 s3.DEBUG DEBUG: Request s3/ListObjectsV2 Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?list-type=2&prefix=&start-after=Sentinel-1 HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:28:30.806160 s3.DEBUG DEBUG: Response s3/ListObjectsV2 Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:28:30 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:28:30.806210 s3.DEBUG DEBUG: Validate Response s3/ListObjectsV2 failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:28:30.806236 s3.ERROR ListObjects &{0xc0000a77b0 <nil> <nil> 0xc0000a77a0 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
in the logs, with a ls: cannot access '/opt/eodata/Sentinel-1': No such file or directory
in the terminal.
First entering the directory and doing an ls
gives
2023/01/27 11:28:30.712715 s3.DEBUG DEBUG: Request s3/ListObjectsV2 Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?list-type=2&prefix=&start-after=Sentinel-1 HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:28:30.806160 s3.DEBUG DEBUG: Response s3/ListObjectsV2 Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 404 NOT FOUND
Content-Length: 27
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Jan 2023 10:28:30 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:28:30.806210 s3.DEBUG DEBUG: Validate Response s3/ListObjectsV2 failed, attempt 0/3, error NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:28:30.806236 s3.ERROR ListObjects &{0xc0000a77b0 <nil> <nil> 0xc0000a77a0 <nil>} = NotFound: Not Found
status code: 404, request id: , host id:
2023/01/27 11:30:06.941860 s3.DEBUG DEBUG: Request s3/ListObjectsV2 Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /DIAS?list-type=2&prefix= HTTP/1.1
Host: data.cloudferro.com
User-Agent: GeeseFS/0.34.1 (go1.16.15; linux; amd64)
Accept-Encoding: identity
-----------------------------------------------------
2023/01/27 11:30:07.038800 s3.DEBUG DEBUG: Response s3/ListObjectsV2 Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 4252
Connection: keep-alive
Content-Type: application/xml
Date: Fri, 27 Jan 2023 10:30:07 GMT
Server: nginx/1.21.3
-----------------------------------------------------
2023/01/27 11:30:07.038900 s3.DEBUG &{[] [: 0 C3S/: 0 CAMS/: 0 CEMS/: 0 CLMS/: 0 CMEMS/: 0 Envisat/: 0 Envisat-ASAR/: 0 Jason-3/: 0 Landsat-5/: 0 Landsat-7/: 0 Landsat-8/: 0 SMOS/: 0 Sentinel-1/: 0 Sentinel-1-COG/: 0 Sentinel-1-RTC/: 0 Sentinel-2/: 0 Sentinel-3/: 0 Sentinel-5P/: 0 Sentinel-6/: 0 auxdata/: 0] 0xc000406da8 false : }
But it also doesn't show anything for ls /opt/eodata/Sentinel-1
in neither the logs or the terminal output.
Ok, so using new list type comes down to the same issue. @vitalif can you perhaps confirm that not supporting: This: http://data.cloudferro.com/DIAS?marker=Sentinel-1&prefix= http://data.cloudferro.com/DIAS?list-type=2&prefix=&start-after=Sentinel-1 While this does work: http://data.cloudferro.com/DIAS?marker=Sentinel-1/&prefix= http://data.cloudferro.com/DIAS?list-type=2&prefix=&start-after=Sentinel-1/ http://data.cloudferro.com/DIAS?prefix=Sentinel-1
Is an issue with the S3? Because the S3 api docs seem to suggest that the marker has to be a key in the bucket, and the key effectively has the '/' appended: 'Sentinel-1/'
Thanks!
Imagine that the list stops at "Key1" before the "Key2". Obviously it should continue with marker=Key1 (not Key1/)...
Ok I see. You mean that start-after is a non-existing key in case of GeeseFS. Then it's... kind of UB :) AWS doc says "start-after can be any key in the bucket" but doesn't say what happens if it's not. In practice all major S3 providers support arbitrary start-after. So I suspect it's not a good idea to reply with 404 in this case. Anyway, it's an important part of GeeseFS optimisations, so I don't want to drop it.
geesefs version: v0.34.2
procedure
Mounting a bucket with:
expected result Being able to access subdirectories like
actual result
log output
so that seems fine. Doing an
ls /opt/eodata/Sentinel-1
givesThe problem seems to be that
GET /DIAS?marker=Sentinel-1&prefix= HTTP/1.1
. Doing a direct curl gives:While this does work with for example
goofys
, which seems to be doing aGET /DIAS?delimiter=%2F&prefix=Sentinel-1%2F
. This does give the correct output: