minio / mc

Unix like utilities for object store
https://min.io/download
GNU Affero General Public License v3.0
2.86k stars 548 forks source link

Adding tolerant reader pattern feature #4943

Closed janwesterkamp closed 4 months ago

janwesterkamp commented 5 months ago

Expected behavior

In some cases, object store implementations do not follow this naming convention, i.e. Ceph allows to set a flag to be able to ignoring it. As a user, I would still be able to access such a (misconfigured) object store, at least with read access - meaning having the reader to be tolerant to this.

Actual behavior

When bucket names do not follow the AWS S3 bucket naming convention, listing the buckets is possible, but accessing their content is not.

Steps to reproduce the behavior

Try access the contents of the Copernicus Dataspace Environment (CDSE) S3 Access service: https://documentation.dataspace.copernicus.eu/APIs/S3.html

Following the Open Data policy, the access to this Ceph instance is public, but credentials for a user and the S3 Access need to be created (self service).

Any Ceph instance with rgw_relaxed_s3_bucket_names flag set and using uppercase letters for their bucket names should be affected too.

mc --version

System information

macOS 14.5 (23F79) - amd64

Details

See corresponding issue on minio-java: https://github.com/minio/minio-java/issues/1564

Related issues and PRs:

Looks like this was implemented in minio-go already, but not in mc yet.

klauspost commented 5 months ago

"S3 compatible" software should be S3 compatible. I don't see any reason to add hacks and workarounds for non-compatible software.

janwesterkamp commented 5 months ago

"S3 compatible" software should be S3 compatible. I don't see any reason to add hacks and workarounds for non-compatible software.

I think this depends on the scope of being AWS S3 compatible: AWS allowed using non-lowercase letters for bucket names in one region for their service in the past (I don't know, if this is still possible there and also did not check it on local testing instances). Ceph allows this by setting the mentioned flag. So there where object store instances, which supporting the violation of the bucket naming convention, including the most prominent one.

AWS tooling like AWS CLI or the s3cmd supporting it in a case-insensitive way out of the box too. AWS libraries like AWS Java SDK supporting it in a case-sensitive way out of the box. It looks like minio-go supports it already too, with a flag set.

I am not talking or suggesting to change the behaviour on object store instances like minio itself. Checking the naming convention is perfect for me there and as a default for tooling and libraries too. I am only requesting a way to access instances out of my control, like the mentioned CDSE/CloudFerro Ceph instance, that is misconfigured from my perspective. But it's unclear if and when they might change/fix it. Your help is very welcome, if you would support my request fixing the root cause there (free self-registration is required only):

https://forum.dataspace.copernicus.eu/t/fixing-s3-access-bucket-naming-convention-violation/699/1

As accessing with the official tooling and libs from AWS is possible (S3 compatible), it would be very nice to have at least read access to such instances by setting a flag to deactivate the validation. This would allow using mc as a more convenient (and may be faster) tool to access such an instance and mc would behave more like the official tooling (the difference would be the requirement to set a flag and allowing it only for read access, but I would tend to see this as a feature or improvement).

zveinn commented 4 months ago

Minio itself is the one denying access, changing this would be a considerable effort seeing as it's a part of the global middleware stack:

https://github.com/minio/minio/blob/62e6dc950d9a5530fbb52249c6e7d569fb337aa0/cmd/generic-handlers.go#L425

janwesterkamp commented 4 months ago

@zveinn, I think there is a misunderstanding:

I do not request any changes to MinIO (server) itself, it should behave like it is and yes, changing it would require to handle issues on file systems that are not case-sensitive as persistence.

But I am asking for supporting the tolerant reader pattern, so I is possible to use MinIO tooling/libs to access data stored in buckets that violate the naming convention instead:

minio-issue

For the red extract and load part, I can not use MinIO tooling or libs like mc yet, as they are not supporting the tolerant reader pattern yet - I am required to use the AWS CLI or s3cmd instead. The transform step is a no-op step in my scenario, in detail it looks like this for the CLI:

minio-CLI-issue

Also, it's required to use the file system as an intermediate storage, as the alternative tooling is not supporting direct copies from bucket to bucket when they are in different environments with different credentials. It looks like mc would support that part, but I can not make benefit of it because of it's limitations on reading the data.

So there is a good reason for this request, in the minio-go implementation there are also preparations available to implement this feature (see links shared above) and it could be fixed with an introduction of another CLI parameter to be tolerant on request - this shoud not be to much work to make mc compatible with the AWS CLI tooling - as its clone s3cmd is already (by default, without additional configuration).

klauspost commented 4 months ago

@janwesterkamp You are welcome to fork in the changes you need. We are not interested in these changes.

zveinn commented 4 months ago

@janwesterkamp can you send me more information on the error you are getting ? .. I am able to access buckets with uppercase naming using mc. I'm only being stopped by minio itself.

2024-06-14T09:58:34.690 [400 Bad Request] handler.ValidRequest 127.0.0.1:9000/Bb2/?location= 127.0.0.1 113µs ⇣ 110.855µs ↑ 93 B ↓ 283 B

You can use the --debug flag to get prints from your requests

marktheunissen commented 4 months ago

Hi @janwesterkamp, it looks like the issue is not uppercase letters, but rather that this particular Copernicus Ceph server does not support ListObjectsV2.

For example, using mc I can successfully download from the server despite the uppercase:

mc get copernicus/DIAS/SMOS/L1B/MIR_SC_D1B/2010/01/14/SM_REPR_MIR_SC_D1B_20100114T161047_20100114T161746_724_100_1/SM_REPR_MIR_SC_D1B_20100114T161047_20100114T161746_724_100_1.HDR ./1.HDR
...161047_20100114T161746_724_100_1.HDR: 18.98 KiB / 18.98 KiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  10.81 KiB/s 1s

I did a debug trace on the ls using mc ls --debug copernicus/DIAS/ and saw this: 404 Not Found... Unsupported request arguments: ['fetch-owner']. That's the clue.

You have a few options here that I can think of off-hand.

  1. You could fork and patch mc here: https://github.com/minio/mc/blob/master/cmd/client-s3.go#L1610 and change this to if isGoogle(c.targetURL.Host) || c.targetURL.Host == "eodata.dataspace.copernicus.eu" {. This worked for me:
 ❯ ./mc ls copernicus/DIAS/
[2024-06-17 11:14:27 AEST]     0B C3S/
[2024-06-17 11:14:27 AEST]     0B CAMS/
.... 
  1. Implement a flag to use V1 and submit a PR, this has been suggested to another user already in: https://github.com/minio/mc/issues/3962

  2. Use another client like s3cmd to perform your ls operations and then use mc when you want to do other operations that are compatible, e.g. my mc get above worked just fine.

janwesterkamp commented 4 months ago

@janwesterkamp You are welcome to fork in the changes you need. We are not interested in these changes.

Fair enough, but first I need to improve my go skills and so I will focus on a PR for minio-java. Meanwhile, I am hoping for getting helping hands and find the root cause first, as it looks like in mc it's a little bit different than in minio-java.

janwesterkamp commented 4 months ago

Hi @zveinn,

here are my test calls with the debug option:

List buckets (successful):

mc ls --debug cdse/

mc: <DEBUG> GET / HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T141923Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 401
Content-Type: application/xml
Date: Mon, 17 Jun 2024 14:19:23 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 2
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 37

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 279.936344ms

[2017-11-15 11:40:52 CET]     0B EODATA/
[2017-11-15 11:40:52 CET]     0B DIAS/

List bucket EODATA result (error):

mc ls --debug cdse/EODATA/
mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T142245Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 2
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 14

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 148.963328ms

mc: <DEBUG> GET /EODATA/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T142245Z

mc: <DEBUG> HTTP/1.1 404 Not Found
Content-Length: 46
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 14:22:46 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 2
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 398
X-Ratelimit-Reset: 14

Unsupported request arguments: ['fetch-owner']mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 53.080758ms

mc: <ERROR> Unable to list folder. The specified bucket does not exist.
 (1) ls.go:239 cmd.doList(..) Tags: [https://eodata.dataspace.copernicus.eu/EODATA/]
 (0) client-s3.go:2371 cmd.(*S3Client).listInRoutine(..)
 Release-Tag:RELEASE.2024-06-12T14-34-03Z | Commit:e7c9a733c680 | Host:Jans-MBP.fritz.box | OS:darwin | Arch:amd64 | Lang:go1.22.4 | Mem:4.7 MiB/14 MiB | Heap:4.7 MiB/7.4 MiB

Get object (successful):

mc get --debug cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe ./
.../manifest.safe: 0 B / ?  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓ mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T143004Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 2
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 55

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 255.798024ms

mc: <DEBUG> GET /EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: identity
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T143005Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 48684
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Date: Mon, 17 Jun 2024 14:30:05 GMT
Etag: "2c42fc82bcfb904884b8c9b041fffa8a-1"
Expires: Wed, 14 Jun 2023 16:34:26 GMT
Last-Modified: Tue, 13 Jun 2023 16:34:47 GMT
Rgwx-Mtime: 1686674087.235764440
Rgwx-Obj-Pg-Ver: 828204741
Rgwx-Source-Zone-Short-Id: 2171516019
Server: envoy
X-Amz-Request-Id: tx00000000000002ca78650-006670486d-8ebc88ae-default
X-Envoy-Upstream-Service-Time: 28
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 398
X-Ratelimit-Reset: 55
X-Rgw-Object-Type: Normal

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 67.576822ms

.../manifest.safe: 47.54 KiB / 47.54 KiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  123.17 KiB/s 0s

Copy object (successful):

mc cp --debug cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe ./
 0 B / ?  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓ mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T143315Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 2
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 45

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 211.706708ms

 0 B / ?  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓ mc: <DEBUG> HEAD /EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T143315Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 48684
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Date: Mon, 17 Jun 2024 14:33:15 GMT
Etag: "2c42fc82bcfb904884b8c9b041fffa8a-1"
Expires: Wed, 14 Jun 2023 16:34:26 GMT
Last-Modified: Tue, 13 Jun 2023 16:34:47 GMT
Rgwx-Mtime: 1686674087.235764440
Rgwx-Obj-Pg-Ver: 828204741
Rgwx-Source-Zone-Short-Id: 2171516019
Server: envoy
X-Amz-Request-Id: tx00000000000002ca29cb6-006670492b-8ebc8ad9-default
X-Envoy-Upstream-Service-Time: 32
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 398
X-Ratelimit-Reset: 45
X-Rgw-Object-Type: Normal

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 73.030976ms

mc: <DEBUG> GET /EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: identity
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T143315Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 48684
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Date: Mon, 17 Jun 2024 14:33:15 GMT
Etag: "2c42fc82bcfb904884b8c9b041fffa8a-1"
Expires: Wed, 14 Jun 2023 16:34:26 GMT
Last-Modified: Tue, 13 Jun 2023 16:34:47 GMT
Rgwx-Mtime: 1686674087.235764440
Rgwx-Obj-Pg-Ver: 828204741
Rgwx-Source-Zone-Short-Id: 2171516019
Server: envoy
X-Amz-Request-Id: tx00000000000002ca29cb8-006670492b-8ebc8ad9-default
X-Envoy-Upstream-Service-Time: 32
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 397
X-Ratelimit-Reset: 45
X-Rgw-Object-Type: Normal

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 70.689138ms

.../manifest.safe: 47.54 KiB / 47.54 KiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  113.30 KiB/s 0s

Copy objects recursive (error):

mc cp -r --debug cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/ ./
 0 B / ?  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓ mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T143748Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 3
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 397
X-Ratelimit-Reset: 11

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 142.668814ms

mc: <DEBUG> GET /EODATA/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=Sentinel-2%2FMSI%2FL1C%2F2023%2F06%2F13%2FS2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE%2F HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T143749Z

mc: <DEBUG> HTTP/1.1 404 Not Found
Content-Length: 46
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 14:37:49 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 3
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 396
X-Ratelimit-Reset: 11

Unsupported request arguments: ['fetch-owner']mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 41.654896ms

mc: <ERROR> Unable to prepare URL for copying. Unable to guess the type of copy operation.
 (2) cp-main.go:296 cmd.printCopyURLsError(..)
 (1) cp-url.go:339 cmd.prepareCopyURLs.func1(..) Tags: [cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/]
 (0) typed-errors.go:46 cmd.init.func18(..)
 Release-Tag:RELEASE.2024-06-12T14-34-03Z | Commit:e7c9a733c680 | Host:Jans-MBP.fritz.box | OS:darwin | Arch:amd64 | Lang:go1.22.4 | Mem:4.8 MiB/18 MiB | Heap:4.8 MiB/11 MiB

Head object (successful):

mc head --debug cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe
mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T145915Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 4
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 45

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 412.852379ms

mc: <DEBUG> GET /EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: identity
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T145915Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 48684
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Date: Mon, 17 Jun 2024 14:59:15 GMT
Etag: "2c42fc82bcfb904884b8c9b041fffa8a-1"
Expires: Wed, 14 Jun 2023 16:34:26 GMT
Last-Modified: Tue, 13 Jun 2023 16:34:47 GMT
Rgwx-Mtime: 1686674087.235764440
Rgwx-Obj-Pg-Ver: 828204741
Rgwx-Source-Zone-Short-Id: 2171516019
Server: envoy
X-Amz-Request-Id: tx00000000000002cb1ecc2-0066704f43-8ebbfb26-default
X-Envoy-Upstream-Service-Time: 69
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 398
X-Ratelimit-Reset: 45
X-Rgw-Object-Type: Normal

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 108.714013ms

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xfdu:XFDU xmlns:gml="http://www.opengis.net/gml" xmlns:safe="http://www.esa.int/safe/sentinel/1.1" xmlns:xfdu="urn:ccsds:schema:xfdu:1" version="esa/safe/sentinel/1.1/sentinel-2/msi/archive_l1c_user_product">   <!-- ===================================================================      INFORMATION PACKAGE MAP SECTION      =================================================================== -->  <informationPackageMap>     <xfdu:contentUnit dmdID="acquisitionPeriod platform" pdiID="processing" textInfo="SENTINEL-2 MSI Level-1C User Product" unitType="Product_Level-1C">
            <xfdu:contentUnit ID="Product_Metadata_Level-1C" unitType="Metadata Unit">
                <dataObjectPointer dataObjectID="S2_Level-1C_Product_Metadata"/>
            </xfdu:contentUnit>
            <xfdu:contentUnit ID="INSPIRE_Metadata_Unit" unitType="Metadata Unit">
                <dataObjectPointer dataObjectID="INSPIRE_Metadata"/>
            </xfdu:contentUnit>
            <xfdu:contentUnit ID="HTML" textInfo="HTML container">
                <xfdu:contentUnit ID="HTML_Presentation_Unit" unitType="Data Unit">

Find object (error):

mc find --debug cdse/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe
mc: <DEBUG> GET /EODATA/?location= HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240617T150133Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 127
Accept-Ranges: bytes
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 00:00:01 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 11
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 399
X-Ratelimit-Reset: 27

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 278.534461ms

mc: <DEBUG> HEAD /EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T150133Z

mc: <DEBUG> HTTP/1.1 200 OK
Content-Length: 48684
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Date: Mon, 17 Jun 2024 15:01:33 GMT
Etag: "2c42fc82bcfb904884b8c9b041fffa8a-1"
Expires: Wed, 14 Jun 2023 16:34:26 GMT
Last-Modified: Tue, 13 Jun 2023 16:34:47 GMT
Rgwx-Mtime: 1686674087.235764440
Rgwx-Obj-Pg-Ver: 828204741
Rgwx-Source-Zone-Short-Id: 2171516019
Server: envoy
X-Amz-Request-Id: tx00000000000002cab11e7-0066704fcd-8ebc88ae-default
X-Envoy-Upstream-Service-Time: 15
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 398
X-Ratelimit-Reset: 27
X-Rgw-Object-Type: Normal

mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 54.503002ms

mc: <DEBUG> GET /EODATA/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=Sentinel-2%2FMSI%2FL1C%2F2023%2F06%2F13%2FS2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE%2Fmanifest.safe HTTP/1.1
Host: eodata.dataspace.copernicus.eu
User-Agent: MinIO (darwin; amd64) minio-go/v7.0.70 mc/RELEASE.2024-06-12T14-34-03Z
Accept-Encoding: zstd,gzip
Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20240617/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240617T150133Z

mc: <DEBUG> HTTP/1.1 404 Not Found
Content-Length: 46
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Jun 2024 15:01:33 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 29
X-Ratelimit-Limit: 400, 400;w=60
X-Ratelimit-Remaining: 397
X-Ratelimit-Reset: 27

Unsupported request arguments: ['fetch-owner']mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Let's Encrypt
mc: <DEBUG>  >> Expires: 2024-08-07 16:15:17 +0000 UTC
mc: <DEBUG> TLS Certificate found: 
mc: <DEBUG>  >> Country: US
mc: <DEBUG>  >> Organization: Internet Security Research Group
mc: <DEBUG>  >> Expires: 2025-09-15 16:00:00 +0000 UTC
mc: <DEBUG> Response Time: 70.369013ms

mc: <ERROR> Unable to list folder. The specified bucket does not exist.
 (1) find.go:300 cmd.doFind(..) Tags: [https://eodata.dataspace.copernicus.eu/EODATA/Sentinel-2/MSI/L1C/2023/06/13/S2B_MSIL1C_20230613T102609_N0509_R108_T32UMA_20230613T141118.SAFE/manifest.safe]
 (0) client-s3.go:2440 cmd.(*S3Client).listRecursiveInRoutine(..)
 Release-Tag:RELEASE.2024-06-12T14-34-03Z | Commit:e7c9a733c680 | Host:Jans-MBP.fritz.box | OS:darwin | Arch:amd64 | Lang:go1.22.4 | Mem:3.8 MiB/18 MiB | Heap:3.8 MiB/11 MiB.

So it looks like, when there is a call with list-type=2 query parameter involved, it fails with that Ceph instance (there is a link above in this ticket, where you can create an account and S3 credentials to play around with).

This means @marktheunissen points to the right direction and mc is already a tolerant reader regarding the uppercase bucket name (at least at some commands) and we need to make it tolerant for ListObjectsV2 requests with optional parameters (by not using them) or force to use ListObjects(V1) instead somehow!

And error messages like this are misleading and might need to be changed here: Unable to list folder. The specified bucket does not exist. But returning a 404 (Not Found) HTTP Status Code on a missing request parameter seems wrong too - this should be a 400 (Bad Request) instead.

janwesterkamp commented 4 months ago

Hi @marktheunissen, this is very helpful! See comments inline:

Hi @janwesterkamp, it looks like the issue is not uppercase letters, but rather that this particular Copernicus Ceph server does not support ListObjectsV2.

For example, using mc I can successfully download from the server despite the uppercase:

mc get copernicus/DIAS/SMOS/L1B/MIR_SC_D1B/2010/01/14/SM_REPR_MIR_SC_D1B_20100114T161047_20100114T161746_724_100_1/SM_REPR_MIR_SC_D1B_20100114T161047_20100114T161746_724_100_1.HDR ./1.HDR
...161047_20100114T161746_724_100_1.HDR: 18.98 KiB / 18.98 KiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  10.81 KiB/s 1s

I did a debug trace on the ls using mc ls --debug copernicus/DIAS/ and saw this: 404 Not Found... Unsupported request arguments: ['fetch-owner']. That's the clue.

So this is an optional V2 request argument, that could be skipped by configuration and/or it need to be implemented in Ceph.

You have a few options here that I can think of off-hand.

1. You could fork and patch `mc` here: https://github.com/minio/mc/blob/master/cmd/client-s3.go#L1610 and change this to `if isGoogle(c.targetURL.Host) || c.targetURL.Host == "eodata.dataspace.copernicus.eu" {`. This worked for me:

This might work, but looks like an eval hack for me - also for the Google instance, as there are private Google Cloud Instances with deviating URLs available too.

Also, I might be capable of patching mc, but I am building a WebGIS and Remote Sensing platform also usable by non-experts, so patching the tooling is a little bit off topic for the target audience here... ;-)

 ❯ ./mc ls copernicus/DIAS/
[2024-06-17 11:14:27 AEST]     0B C3S/
[2024-06-17 11:14:27 AEST]     0B CAMS/
.... 
2. Implement a flag to use V1 and submit a PR, this has been suggested to another user already in: [mc ls doesn't work well with ListObjects (no V2) #3962](https://github.com/minio/mc/issues/3962)

As this is not only affecting obvious use cases like find (but also cp -r), having a CLI flag on some commands (and not on others) is not very intuitive (but might work).

What about having an optional configuration flag for ListObjects Version instead - this would also cover the isGoogle use case and is after configuration transparent for the user and belongs to a configured environment only (and there could be more than one involved in a single CLI command, where a single CLI flag could lead to issues)?

3. Use another client like s3cmd to perform your `ls` operations and then use `mc` when you want to do other operations that are compatible, e.g. my `mc get` above worked just fine.

* https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#API_ListObjectsV2_RequestSyntax

* https://tracker.ceph.com/issues/23908

* [mc ls doesn't work well with ListObjects (no V2) #3962](https://github.com/minio/mc/issues/3962)

Yes, this is my current workaround. But I would like to not use s3cmd, as it has issues with multiple environment configurations. This could be handled with separate configuration files, but requires an intermediate step via the local file system, when copying from buckets in different environments.

So in conclusion, having that optional configuration option on the mc environment for ListObject (V1) support would be my preferred solution here and also could make the existing Google workaround obsolete.

Starting to get the fetch-owner query parameter supported by Ceph to support optional (and used) ListObjectV2 parameters is another direction to go in parallel.

harshavardhana commented 4 months ago

ListObjectsV2 has been around for many years; I have no idea why ListObjectsV2 is not supported on the server side.

We added GCS support because we have always had it, and they will never implement ListObjectsV2 - their existing S3 implementation is pretty shambolic.

Ceph has already implemented ListObjectsV2 for some time now. As far as I know, instead of complicating mc, you should upgrade your server.

harshavardhana commented 4 months ago

If you are looking for content migration from Ceph to MinIO, where Ceph requires listObjectsV1 support, we can look at this feature request.

You can add

v := env.Get("_MC_S3_LIST_OBJECTSV2")
if v == "off"  {
   // use v1

Send a PR since this workaround is temporary, and we do not want to introduce an official flag. this allows us to remove this ENV in future (add a comment that this can be removed any time in the future)

So rely on it until you upgrade your software that supports ListObjectsV2

marktheunissen commented 4 months ago

The server seems to be owned by the European Union. As a government agency, I would not expect a fast upgrade to the latest version... but worth a try to ask them.

harshavardhana commented 4 months ago

The server seems to be owned by the European Union. As a government agency, I would not expect a fast upgrade to the latest version... but worth a try to ask them.

Ceph fixed this about 5 years ago; they may never upgrade :-).