s3gw-tech / s3gw

Container able to run on a Kubernetes cluster, providing S3-compatible endpoints to applications.
https://s3gw.tech
Apache License 2.0
127 stars 20 forks source link

rgw/sfs: The response of the bucket creation endpoint returns invalid data #155

Closed votdev closed 1 year ago

votdev commented 1 year ago

I'm sorry to tell, but i think https://github.com/aquarist-labs/ceph/pull/27 is a regression.

I'm currently switching to the AWS-SDK-JS and realized a problem when creating buckets using a system user (e.g. our admin user that is created by default). The problem does not occur with 'normal' users.

The AWS-SDK call createBucket fails with an XML parsing error in that case.

code: "XMLParserError"
columnNumber: 43
fileName: "http://localhost:4200/vendor.js"
lineNumber: 17265
message: "XML-Verarbeitungsfehler: nicht wohlgeformt\nAdresse: http://localhost:4200/#/admin/buckets/create\nZeile Nr. 1, Spalte 1:{\"entry_point_object_ver\":{\"tag\":\"\",\"ver\":0},\"object_ver\":{\"tag\":\"\",\"ver\":0},\"bucket_info\":{\"bucket\":{\"name\":\"5555555555\",\"marker\":\"5555555555.1665593231763845095\",\"bucket_id\":\"5555555555.1665593231763845095\",\"tenant\":\"\",\"explicit_placement\":{\"data_pool\":\"\",\"data_extra_pool\":\"\",\"index_pool\":\"\"}},\"creation_time\":\"2022-10-12T16:47:11.763845Z\",\"owner\":\"testid\",\"flags\":0,\"zonegroup\":\"\",\"placement_rule\":\"default\",\"has_instance_obj\":false,\"quota\":{\"enabled\":false,\"check_on_raw\":false,\"max_size\":-1,\"max_size_kb\":0,\"max_objects\":-1},\"num_shards\":1,\"bi_shard_hash_type\":0,\"requester_pays\":false,\"has_website\":false,\"swift_versioning\":false,\"swift_ver_location\":\"\",\"index_type\":0,\"mdsearch_config\":[],\"reshard_status\":0,\"new_bucket_instance_id\":\"\"}}\n^"
name: "XMLParserError"
retryable: true
stack: "65640/DomXmlParser.prototype.parse@http://localhost:4200/vendor.js:22694:26\nextractData@http://localhost:4200/vendor.js:14639:23\ncallListeners@http:
...
//localhost:4200/vendor.js:56441:18\nfilter/</<@http://localhost:4200/vendor.js:57907:181\nOperatorSubscriber/this._next<@http://localhost:4200/vendor.js:57607:27\nnext@http://localhost:4200/vendor.js:56441:18\ndoInnerSub/<@http://localhost:4200/vendor.js:58136:28\nOperatorSubscriber/this._next<@http://localhost:4200/vendor.js:57607:27\nnext@http://localhost:4200/vendor.js:56441:18\n"
statusCode: 200

According to the documentation the response should look like

HTTP/1.1 200
Location: Location
<?xml version="1.0" encoding="UTF-8"?>
<[CreateBucketResult](https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_CreateBucket.html#AmazonS3-control_CreateBucket-response-CreateBucketResult)>
   <[BucketArn](https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_CreateBucket.html#AmazonS3-control_CreateBucket-response-BucketArn)>string</[BucketArn](https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_CreateBucket.html#AmazonS3-control_CreateBucket-response-BucketArn)>
</[CreateBucketResult](https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_CreateBucket.html#AmazonS3-control_CreateBucket-response-CreateBucketResult)>

I can not confirm that the response of the RGW looks like that because a successful PUT request for https://s3gw.local/5555555555sdfsdfsdfs (as normal user) does not show any response data in the browser dev tools.

Body

<CreateBucketConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><LocationConstraint></LocationConstraint></CreateBucketConfiguration>

Response:

content-length  0
giubacc commented 1 year ago

https://github.com/aquarist-labs/ceph/pull/27 was addressed because the need was that UI required the non standard response body of create bucket. Is this no more the case? It seems to me that the client changed the way it is interpreting the response, how can this be a regression?

giubacc commented 1 year ago

We should carefully evaluate if AWS-SDK-JS is flexible enough for our needs; let's always remember that we should need to call non standard S3 calls. An example is https://github.com/aquarist-labs/s3gw/issues/86 these calls: link/unlink/chown are all non standard.

jhmarina commented 1 year ago

@votdev will test it again to confirm whether this is still happening

votdev commented 1 year ago

Issue can be closed. PR https://github.com/aquarist-labs/s3gw-ui/pull/157 is using the createBucket method.

votdev commented 1 year ago

The issue has been reopened because i realized a strange behaviour while testing my PR https://github.com/aquarist-labs/s3gw-ui/pull/157. Using a s3gw instance running on Digital Ocean does work as expected. Running a local container causes a Bucket already exists error. This happens because the response of the createBucket AWS SDK function call causes the above mentioned XML parsing error which forces the createBucket function to be executed a second time (default behaviour of AWS SDK).

I don't know why i can not see that behaviour with the Digital Ocean instance although the containers are the same version (and installed from the Quay.io registry).

votdev commented 1 year ago

Creating a bucket using v0.13.0 running in a local container + Caddy2 as a reverse proxy. The response of the PUT is JSON instead of XML.

The PUT request:

PUT /test01 HTTP/1.1
Host: localhost:4300
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0
Accept: */*
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate, br
X-Amz-User-Agent: aws-sdk-js/2.1342.0 promise
x-amz-bucket-object-lock-enabled: false
Content-Type: application/octet-stream; charset=UTF-8
X-Amz-Date: Wed, 29 Mar 2023 15:50:09 GMT
Authorization: AWS test:Hf4XYwXomPxQyVKzxbWCUFVrPQA=
Content-Length: 144
Origin: http://localhost:4200
Connection: keep-alive
Referer: http://localhost:4200/
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Pragma: no-cache
Cache-Control: no-cache

The response of the PUT

HTTP/1.1 200 OK
Access-Control-Allow-Headers: *
Access-Control-Allow-Methods: DELETE,GET,HEAD,POST,PUT,OPTIONS
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: *
Content-Length: 733
Date: Wed, 29 Mar 2023 15:50:09 GMT
Content-Type: text/plain; charset=utf-8
{
   "entry_point_object_ver":{
      "tag":"",
      "ver":0
   },
   "object_ver":{
      "tag":"",
      "ver":0
   },
   "bucket_info":{
      "bucket":{
         "name":"test01",
         "marker":"test01.1680101373936023780",
         "bucket_id":"test01.1680101373936023780",
         "tenant":"",
         "explicit_placement":{
            "data_pool":"",
            "data_extra_pool":"",
            "index_pool":""
         }
      },
      "creation_time":"2023-03-29T14:49:33.936023Z",
      "owner":"testid",
      "flags":0,
      "zonegroup":"",
      "placement_rule":"default",
      "has_instance_obj":false,
      "quota":{
         "enabled":false,
         "check_on_raw":false,
         "max_size":-1,
         "max_size_kb":0,
         "max_objects":-1
      },
      "num_shards":1,
      "bi_shard_hash_type":0,
      "requester_pays":false,
      "has_website":false,
      "swift_versioning":false,
      "swift_ver_location":"",
      "index_type":0,
      "mdsearch_config":[

      ],
      "reshard_status":0,
      "new_bucket_instance_id":""
   }
}

This is the log of the local container:

2023-03-29T15:32:15.217+0000 7f05d27d0700  1 ====== starting new request req=0x7f05f72c0740 =====
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 zone_unique_id: TODO
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 zone_unique_trans_id: TODO
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 get_host_id: TODO
2023-03-29T15:32:15.217+0000 7f05d27d0700  2 req 0 0.000000000s initializing for trans_id = 
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s rgw api priority: s3=8 s3website=7
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s host=localhost
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s meta>> HTTP_X_AMZ_BUCKET_OBJECT_LOCK_ENABLED
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s meta>> HTTP_X_AMZ_DATE
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s meta>> HTTP_X_AMZ_USER_AGENT
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s x>> x-amz-bucket-object-lock-enabled:false
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s x>> x-amz-date:Wed, 29 Mar 2023 15:32:15 GMT
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s x>> x-amz-user-agent:aws-sdk-js/2.1342.0 promise
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s handler=25RGWHandler_REST_Bucket_S3
2023-03-29T15:32:15.217+0000 7f05d27d0700  2 req 0 0.000000000s getting op 1
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s s3:create_bucket scheduling with throttler client=3 cost=1
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s s3:create_bucket op=27RGWCreateBucket_ObjStore_S3
2023-03-29T15:32:15.217+0000 7f05d27d0700  2 req 0 0.000000000s s3:create_bucket verifying requester
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s get_canon_resource(): dest=/test01
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s string_to_sign:
PUT

application/octet-stream; charset=UTF-8

x-amz-bucket-object-lock-enabled:false
x-amz-date:Wed, 29 Mar 2023 15:32:15 GMT
x-amz-user-agent:aws-sdk-js/2.1342.0 promise
/test01
2023-03-29T15:32:15.217+0000 7f05d27d0700 15 req 0 0.000000000s s3:create_bucket string_to_sign=PUT

application/octet-stream; charset=UTF-8

x-amz-bucket-object-lock-enabled:false
x-amz-date:Wed, 29 Mar 2023 15:32:15 GMT
x-amz-user-agent:aws-sdk-js/2.1342.0 promise
/test01
2023-03-29T15:32:15.217+0000 7f05d27d0700 15 req 0 0.000000000s s3:create_bucket server signature=cmtlxMJPqRQP7a/x6r1legL2pko=
2023-03-29T15:32:15.217+0000 7f05d27d0700 15 req 0 0.000000000s s3:create_bucket client signature=cmtlxMJPqRQP7a/x6r1legL2pko=
2023-03-29T15:32:15.217+0000 7f05d27d0700 15 req 0 0.000000000s s3:create_bucket compare=0
2023-03-29T15:32:15.217+0000 7f05d27d0700  2 req 0 0.000000000s s3:create_bucket normalizing buckets and tenants
2023-03-29T15:32:15.217+0000 7f05d27d0700 10 req 0 0.000000000s s->object=<NULL> s->bucket=test01
2023-03-29T15:32:15.217+0000 7f05d27d0700  2 req 0 0.000000000s s3:create_bucket init permissions
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket recalculating target
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket reading permissions
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket init op
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket verifying op mask
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket verifying op permissions
2023-03-29T15:32:15.221+0000 7f05d27d0700 10 req 0 0.004000062s s3:create_bucket list_buckets: marker (, ), max=1000
2023-03-29T15:32:15.221+0000 7f05d27d0700 10 req 0 0.004000062s s3:create_bucket list_buckets: buckets={}
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket verifying op params
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket pre-executing
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket check rate limiting
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket executing
2023-03-29T15:32:15.221+0000 7f05d27d0700  5 req 0 0.004000062s s3:create_bucket NOTICE: call to do_aws4_auth_completion
2023-03-29T15:32:15.221+0000 7f05d27d0700 10 req 0 0.004000062s s3:create_bucket create bucket location constraint: 
2023-03-29T15:32:15.221+0000 7f05d27d0700 10 req 0 0.004000062s s3:create_bucket get_bucket: get_bucket by name: test01
2023-03-29T15:32:15.221+0000 7f05d27d0700 10 req 0 0.004000062s s3:create_bucket user=testid bucket=:test01[])
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket completing
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket op status=0
2023-03-29T15:32:15.221+0000 7f05d27d0700  2 req 0 0.004000062s s3:create_bucket http status=200
2023-03-29T15:32:15.221+0000 7f05d27d0700  1 ====== req done req=0x7f05f72c0740 op status=0 http_status=200 latency=0.004000062s ======
2023-03-29T15:32:15.221+0000 7f05d27d0700  1 beast: 0x7f05f72c0740: 10.0.2.100 - testid [29/Mar/2023:15:32:15.217 +0000] "PUT /test01 HTTP/1.1" 200 733 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0" - latency=0.004000062s
votdev commented 1 year ago

This is the Digital Ocean instance. The strange behaviour here is that the client is the same as for the previous tests using the local running s3gw; but in this case the PUT request has no response, therefore the XML parsing error does not occur.

The PUT request:

PUT /test01 HTTP/2
Host: s3gw.46.101.227.64.sslip.io
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0
Accept: */*
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate, br
X-Amz-User-Agent: aws-sdk-js/2.1342.0 promise
x-amz-bucket-object-lock-enabled: false
Content-Type: application/octet-stream; charset=UTF-8
X-Amz-Date: Wed, 29 Mar 2023 15:45:07 GMT
Authorization: AWS test:Jqn0A1pWtelJL0QuYX+udtILAKc=
Content-Length: 144
Origin: http://localhost:4200
Connection: keep-alive
Referer: http://localhost:4200/
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
Pragma: no-cache
Cache-Control: no-cache
TE: trailers

The response of the PUT request

HTTP/2 200 OK
access-control-allow-origin: *
access-control-expose-headers: *
date: Wed, 29 Mar 2023 15:45:07 GMT
content-length: 0
X-Firefox-Spdy: h2

The s3gw logs:

2023-03-29T15:42:59.803+0000 7f4448b40700  1 ====== starting new request req=0x7f434d148780 =====
2023-03-29T15:42:59.803+0000 7f4448b40700 10 zone_unique_id: TODO
2023-03-29T15:42:59.803+0000 7f4448b40700 10 zone_unique_trans_id: TODO
2023-03-29T15:42:59.807+0000 7f4448b40700 10 get_host_id: TODO
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s initializing for trans_id = 
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s rgw api priority: s3=8 s3website=7
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s host=s3gw.46.101.227.64.sslip.io
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s meta>> HTTP_X_AMZ_BUCKET_OBJECT_LOCK_ENABLED
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s meta>> HTTP_X_AMZ_DATE
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s meta>> HTTP_X_AMZ_USER_AGENT
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s x>> x-amz-bucket-object-lock-enabled:false
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s x>> x-amz-date:Wed, 29 Mar 2023 15:42:59 GMT
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s x>> x-amz-user-agent:aws-sdk-js/2.1342.0 promise
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s handler=25RGWHandler_REST_Bucket_S3
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s getting op 1
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s s3:create_bucket scheduling with throttler client=3 cost=1
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s s3:create_bucket op=27RGWCreateBucket_ObjStore_S3
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket verifying requester
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s get_canon_resource(): dest=/test01
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s string_to_sign:
PUT

application/octet-stream; charset=UTF-8

x-amz-bucket-object-lock-enabled:false
x-amz-date:Wed, 29 Mar 2023 15:42:59 GMT
x-amz-user-agent:aws-sdk-js/2.1342.0 promise
/test01
2023-03-29T15:42:59.807+0000 7f4448b40700 15 req 0 0.004000053s s3:create_bucket string_to_sign=PUT

application/octet-stream; charset=UTF-8

x-amz-bucket-object-lock-enabled:false
x-amz-date:Wed, 29 Mar 2023 15:42:59 GMT
x-amz-user-agent:aws-sdk-js/2.1342.0 promise
/test01
2023-03-29T15:42:59.807+0000 7f4448b40700 15 req 0 0.004000053s s3:create_bucket server signature=szbOUiVnzcdkozusMvM06AzxFgg=
2023-03-29T15:42:59.807+0000 7f4448b40700 15 req 0 0.004000053s s3:create_bucket client signature=szbOUiVnzcdkozusMvM06AzxFgg=
2023-03-29T15:42:59.807+0000 7f4448b40700 15 req 0 0.004000053s s3:create_bucket compare=0
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket normalizing buckets and tenants
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s s->object=<NULL> s->bucket=test01
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket init permissions
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket recalculating target
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket reading permissions
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket init op
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket verifying op mask
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket verifying op permissions
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s s3:create_bucket list_buckets: marker (, ), max=1000
2023-03-29T15:42:59.807+0000 7f4448b40700 10 req 0 0.004000053s s3:create_bucket list_buckets: buckets={}
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket verifying op params
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket pre-executing
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket check rate limiting
2023-03-29T15:42:59.807+0000 7f4448b40700  2 req 0 0.004000053s s3:create_bucket executing
2023-03-29T15:42:59.807+0000 7f4430b10700  5 req 0 0.004000053s s3:create_bucket NOTICE: call to do_aws4_auth_completion
2023-03-29T15:42:59.807+0000 7f4430b10700 10 req 0 0.004000053s s3:create_bucket create bucket location constraint: 
2023-03-29T15:42:59.807+0000 7f4430b10700 10 req 0 0.004000053s s3:create_bucket get_bucket: get_bucket by name: test01
2023-03-29T15:42:59.807+0000 7f4430b10700 10 req 0 0.004000053s s3:create_bucket user=testid bucket=:test01[])
2023-03-29T15:42:59.807+0000 7f4430b10700  2 req 0 0.004000053s s3:create_bucket completing
2023-03-29T15:42:59.807+0000 7f4430b10700  2 req 0 0.004000053s s3:create_bucket op status=0
2023-03-29T15:42:59.807+0000 7f4430b10700  2 req 0 0.004000053s s3:create_bucket http status=200
2023-03-29T15:42:59.807+0000 7f4430b10700  1 ====== req done req=0x7f434d148780 op status=0 http_status=200 latency=0.004000053s ======
2023-03-29T15:42:59.807+0000 7f4430b10700  1 beast: 0x7f434d148780: 10.42.0.8 - testid [29/Mar/2023:15:42:59.803 +0000] "PUT /test01 HTTP/1.1" 200 0 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0" - latency=0.004000053s
votdev commented 1 year ago

The question: Why is the endpoint returning a JSON response for one request and an empty response for another request.

Empty response 200 0

beast: 0x7f434d148780: 10.42.0.8 - testid [29/Mar/2023:15:42:59.803 +0000] "PUT /test01 HTTP/1.1" 200 0 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0" - latency=0.004000053s

Bildschirmfoto vom 2023-03-29 17-59-10

vs.

Response with JSON content (733 bytes) 200 733

beast: 0x7f05f72c0740: 10.0.2.100 - testid [29/Mar/2023:15:32:15.217 +0000] "PUT /test01 HTTP/1.1" 200 733 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0" - latency=0.004000062s

Bildschirmfoto vom 2023-03-29 18-02-44

irq0 commented 1 year ago

Is DO proxying and possibly more the request/response?

The response header X-Firefox-Spdy: h2 suggests your firefox speaks HTTP/2. s3gw reports HTTP/1.1. There must be something in between.

The request coming in to s3gw would be interesting. Can you put sth like a mitmproxy (mitmproxy --mode reverse:http://localhost:7480 --listen-host localhost -p 9000) in between?

votdev commented 1 year ago

Is DO proxying and possibly more the request/response?

The response header X-Firefox-Spdy: h2 suggests your firefox speaks HTTP/2. s3gw reports HTTP/1.1. There must be something in between.

The request coming in to s3gw would be interesting. Can you put sth like a mitmproxy (mitmproxy --mode reverse:http://localhost:7480 --listen-host localhost -p 9000) in between?

UI <-- Traefik ingress --> s3gw@DigitalOcean uses HTTPS.

UI <-- caddy2 reverse proxy --> s3gw@localhost uses HTTP because for development i do not need HTTPS.

Does that mean that the s3gw response depends on the used protocol?

irq0 commented 1 year ago

My theory would be that something on the way to s3gw mangles the request / response. The 'X-Firefox-Spdy: h2' is a clue. Something seems to convert HTTP/2 to HTTP/1.1. Very different protocols. Searching for 'traefik http/2' gives some more clues, that there may be problems with HTTP/2 not supported on the backend.

To verify we would need to see the request just before it hits and the response when it leaves s3gw (e.g mitmproxy).

votdev commented 1 year ago

My theory would be that something on the way to s3gw mangles the request / response. The 'X-Firefox-Spdy: h2' is a clue. Something seems to convert HTTP/2 to HTTP/1.1. Very different protocols. Searching for 'traefik http/2' gives some more clues, that there may be problems with HTTP/2 not supported on the backend.

To verify we would need to see the request just before it hits and the response when it leaves s3gw (e.g mitmproxy).

Both setups always use HTTP/1.1 for the communication between reverse proxy <-> s3gw. You can see that both setups log PUT /test01 HTTP/1.1 when the request is done. So the question is why in the DO setup the response is empty and in my local setup the response contains JSON data. s3gw is doing some decision here which i do not understand why. And the problem is in s3gw IMO, not in the proxies (Traefik or caddy2).

irq0 commented 1 year ago

s3gw is doing some decision here which i do not understand why. And the problem is in s3gw IMO, not in the proxies (Traefik or caddy2).

s3gw computes the response based on the request (we can take config, state, etc. as fixed in this instance). Right now we have the request as it looks like leaving your client / entering the rev proxy. The response is likely already wrong leaving s3gw as it logs content length 0.

We can deduce that the request causes the empty response, not a modification to the response.

Same request, same s3gw, different proxy setup leads to the zero response. We need a dump of the request after the proxy, before s3gw to get to the bottom of this.

votdev commented 1 year ago

The PCAP (Wireshark) dump containing the PUT request before (localhost) and after (192.168.178.103) the proxy.

put-local-s3gw.zip

votdev commented 1 year ago

This is the tcpdump sniffed in the s3gw container (on the DO setup):

12:07:14.695515 IP 10-42-0-8.traefik.kube-system.svc.cluster.local.53870 > s3gw-79d5b9fd5f-g54g5.7480: Flags [P.], seq 2241:3087, ack 734, win 504, options [nop,nop,TS val 859729639 ecr 1218473367], length 846
        0x0000:  4500 0382 c6a9 4000 4006 5bee 0a2a 0008  E.....@.@.[..*..
        0x0010:  0a2a 0083 d26e 1d38 120f 2d95 e445 a278  .*...n.8..-..E.x
        0x0020:  8018 01f8 1853 0000 0101 080a 333e 6ee7  .....S......3>n.
        0x0030:  48a0 6d97 5055 5420 2f74 6573 7430 3120  H.m.PUT./test01.
        0x0040:  4854 5450 2f31 2e31 0d0a 486f 7374 3a20  HTTP/1.1..Host:.
        0x0050:  7333 6777 2e34 362e 3130 312e 3232 372e  s3gw.46.101.227.
        0x0060:  3634 2e73 736c 6970 2e69 6f0d 0a55 7365  64.sslip.io..Use
        0x0070:  722d 4167 656e 743a 204d 6f7a 696c 6c61  r-Agent:.Mozilla
        0x0080:  2f35 2e30 2028 5831 313b 2055 6275 6e74  /5.0.(X11;.Ubunt
        0x0090:  753b 204c 696e 7578 2078 3836 5f36 343b  u;.Linux.x86_64;
        0x00a0:  2072 763a 3130 392e 3029 2047 6563 6b6f  .rv:109.0).Gecko
        0x00b0:  2f32 3031 3030 3130 3120 4669 7265 666f  /20100101.Firefo
        0x00c0:  782f 3131 312e 300d 0a43 6f6e 7465 6e74  x/111.0..Content
        0x00d0:  2d4c 656e 6774 683a 2031 3434 0d0a 4163  -Length:.144..Ac
        0x00e0:  6365 7074 3a20 2a2f 2a0d 0a41 6363 6570  cept:.*/*..Accep
        0x00f0:  742d 456e 636f 6469 6e67 3a20 677a 6970  t-Encoding:.gzip
        0x0100:  2c20 6465 666c 6174 652c 2062 720d 0a41  ,.deflate,.br..A
        0x0110:  6363 6570 742d 4c61 6e67 7561 6765 3a20  ccept-Language:.
        0x0120:  6465 2c65 6e2d 5553 3b71 3d30 2e37 2c65  de,en-US;q=0.7,e
        0x0130:  6e3b 713d 302e 330d 0a41 7574 686f 7269  n;q=0.3..Authori
        0x0140:  7a61 7469 6f6e 3a20 4157 5320 7465 7374  zation:.AWS.test
        0x0150:  3a54 6263 385a 3646 4672 6242 7638 3963  :Tbc8Z6FFrbBv89c
        0x0160:  4559 6542 6978 6436 4442 7649 3d0d 0a43  EYeBixd6DBvI=..C
        0x0170:  6f6e 7465 6e74 2d54 7970 653a 2061 7070  ontent-Type:.app
        0x0180:  6c69 6361 7469 6f6e 2f6f 6374 6574 2d73  lication/octet-s
        0x0190:  7472 6561 6d3b 2063 6861 7273 6574 3d55  tream;.charset=U
        0x01a0:  5446 2d38 0d0a 4f72 6967 696e 3a20 6874  TF-8..Origin:.ht
        0x01b0:  7470 3a2f 2f6c 6f63 616c 686f 7374 3a34  tp://localhost:4
        0x01c0:  3230 300d 0a52 6566 6572 6572 3a20 6874  200..Referer:.ht
        0x01d0:  7470 3a2f 2f6c 6f63 616c 686f 7374 3a34  tp://localhost:4
        0x01e0:  3230 302f 0d0a 5365 632d 4665 7463 682d  200/..Sec-Fetch-
        0x01f0:  4465 7374 3a20 656d 7074 790d 0a53 6563  Dest:.empty..Sec
        0x0200:  2d46 6574 6368 2d4d 6f64 653a 2063 6f72  -Fetch-Mode:.cor
        0x0210:  730d 0a53 6563 2d46 6574 6368 2d53 6974  s..Sec-Fetch-Sit
        0x0220:  653a 2063 726f 7373 2d73 6974 650d 0a54  e:.cross-site..T
        0x0230:  653a 2074 7261 696c 6572 730d 0a58 2d41  e:.trailers..X-A
        0x0240:  6d7a 2d42 7563 6b65 742d 4f62 6a65 6374  mz-Bucket-Object
        0x0250:  2d4c 6f63 6b2d 456e 6162 6c65 643a 2066  -Lock-Enabled:.f
        0x0260:  616c 7365 0d0a 582d 416d 7a2d 4461 7465  alse..X-Amz-Date
        0x0270:  3a20 4d6f 6e2c 2030 3320 4170 7220 3230  :.Mon,.03.Apr.20
        0x0280:  3233 2031 323a 3037 3a31 3420 474d 540d  23.12:07:14.GMT.
        0x0290:  0a58 2d41 6d7a 2d55 7365 722d 4167 656e  .X-Amz-User-Agen
        0x02a0:  743a 2061 7773 2d73 646b 2d6a 732f 322e  t:.aws-sdk-js/2.
        0x02b0:  3133 3438 2e30 2070 726f 6d69 7365 0d0a  1348.0.promise..
        0x02c0:  582d 466f 7277 6172 6465 642d 466f 723a  X-Forwarded-For:
        0x02d0:  2031 302e 3432 2e30 2e31 0d0a 582d 466f  .10.42.0.1..X-Fo
        0x02e0:  7277 6172 6465 642d 486f 7374 3a20 7333  rwarded-Host:.s3
        0x02f0:  6777 2e34 362e 3130 312e 3232 372e 3634  gw.46.101.227.64
        0x0300:  2e73 736c 6970 2e69 6f0d 0a58 2d46 6f72  .sslip.io..X-For
        0x0310:  7761 7264 6564 2d50 6f72 743a 2034 3433  warded-Port:.443
        0x0320:  0d0a 582d 466f 7277 6172 6465 642d 5072  ..X-Forwarded-Pr
        0x0330:  6f74 6f3a 2068 7474 7073 0d0a 582d 466f  oto:.https..X-Fo
        0x0340:  7277 6172 6465 642d 5365 7276 6572 3a20  rwarded-Server:.
        0x0350:  7472 6165 6669 6b2d 3634 6239 3663 6362  traefik-64b96ccb
        0x0360:  6364 2d35 347a 746d 0d0a 582d 5265 616c  cd-54ztm..X-Real
        0x0370:  2d49 703a 2031 302e 3432 2e30 2e31 0d0a  -Ip:.10.42.0.1..
        0x0380:  0d0a                                     ..
12:07:14.695543 IP s3gw-79d5b9fd5f-g54g5.7480 > 10-42-0-8.traefik.kube-system.svc.cluster.local.53870: Flags [.], ack 3087, win 501, options [nop,nop,TS val 1218476227 ecr 859729639], length 0
        0x0000:  4500 0034 4fae 4000 4006 d637 0a2a 0083  E..4O.@.@..7.*..
        0x0010:  0a2a 0008 1d38 d26e e445 a278 120f 30e3  .*...8.n.E.x..0.
        0x0020:  8010 01f5 1505 0000 0101 080a 48a0 78c3  ............H.x.
        0x0030:  333e 6ee7                                3>n.
12:07:14.695746 IP 10-42-0-8.traefik.kube-system.svc.cluster.local.53870 > s3gw-79d5b9fd5f-g54g5.7480: Flags [P.], seq 3087:3231, ack 734, win 504, options [nop,nop,TS val 859729639 ecr 1218476227], length 144
        0x0000:  4500 00c4 c6aa 4000 4006 5eab 0a2a 0008  E.....@.@.^..*..
        0x0010:  0a2a 0083 d26e 1d38 120f 30e3 e445 a278  .*...n.8..0..E.x
        0x0020:  8018 01f8 1595 0000 0101 080a 333e 6ee7  ............3>n.
        0x0030:  48a0 78c3 3c43 7265 6174 6542 7563 6b65  H.x.<CreateBucke
        0x0040:  7443 6f6e 6669 6775 7261 7469 6f6e 2078  tConfiguration.x
        0x0050:  6d6c 6e73 3d22 6874 7470 3a2f 2f73 332e  mlns="http://s3.
        0x0060:  616d 617a 6f6e 6177 732e 636f 6d2f 646f  amazonaws.com/do
        0x0070:  632f 3230 3036 2d30 332d 3031 2f22 3e3c  c/2006-03-01/"><
        0x0080:  4c6f 6361 7469 6f6e 436f 6e73 7472 6169  LocationConstrai
        0x0090:  6e74 3e3c 2f4c 6f63 6174 696f 6e43 6f6e  nt></LocationCon
        0x00a0:  7374 7261 696e 743e 3c2f 4372 6561 7465  straint></Create
        0x00b0:  4275 636b 6574 436f 6e66 6967 7572 6174  BucketConfigurat
        0x00c0:  696f 6e3e                                ion>
12:07:14.695755 IP s3gw-79d5b9fd5f-g54g5.7480 > 10-42-0-8.traefik.kube-system.svc.cluster.local.53870: Flags [.], ack 3231, win 501, options [nop,nop,TS val 1218476227 ecr 859729639], length 0
        0x0000:  4500 0034 4faf 4000 4006 d636 0a2a 0083  E..4O.@.@..6.*..
        0x0010:  0a2a 0008 1d38 d26e e445 a278 120f 3173  .*...8.n.E.x..1s
        0x0020:  8010 01f5 1505 0000 0101 080a 48a0 78c3  ............H.x.
        0x0030:  333e 6ee7                                3>n.
12:07:14.699956 IP s3gw-79d5b9fd5f-g54g5.7480 > 10-42-0-8.traefik.kube-system.svc.cluster.local.53870: Flags [P.], seq 734:869, ack 3231, win 501, options [nop,nop,TS val 1218476231 ecr 859729639], length 135
        0x0000:  4500 00bb 4fb0 4000 4006 d5ae 0a2a 0083  E...O.@.@....*..
        0x0010:  0a2a 0008 1d38 d26e e445 a278 120f 3173  .*...8.n.E.x..1s
        0x0020:  8018 01f5 158c 0000 0101 080a 48a0 78c7  ............H.x.
        0x0030:  333e 6ee7 4854 5450 2f31 2e31 2032 3030  3>n.HTTP/1.1.200
        0x0040:  204f 4b0d 0a43 6f6e 7465 6e74 2d4c 656e  .OK..Content-Len
        0x0050:  6774 683a 2030 0d0a 4461 7465 3a20 4d6f  gth:.0..Date:.Mo
        0x0060:  6e2c 2030 3320 4170 7220 3230 3233 2031  n,.03.Apr.2023.1
        0x0070:  323a 3037 3a31 3420 474d 540d 0a53 6572  2:07:14.GMT..Ser
        0x0080:  7665 723a 2043 6570 6820 4f62 6a65 6374  ver:.Ceph.Object
        0x0090:  2047 6174 6577 6179 2028 7265 6566 290d  .Gateway.(reef).
        0x00a0:  0a43 6f6e 6e65 6374 696f 6e3a 204b 6565  .Connection:.Kee
        0x00b0:  702d 416c 6976 650d 0a0d 0a              p-Alive....
12:07:14.700022 IP 10-42-0-8.traefik.kube-system.svc.cluster.local.53870 > s3gw-79d5b9fd5f-g54g5.7480: Flags [.], ack 869, win 503, options [nop,nop,TS val 859729643 ecr 1218476231], length 0
        0x0000:  4500 0034 c6ab 4000 4006 5f3a 0a2a 0008  E..4..@.@._:.*..
        0x0010:  0a2a 0083 d26e 1d38 120f 3173 e445 a2ff  .*...n.8..1s.E..
        0x0020:  8010 01f7 1505 0000 0101 080a 333e 6eeb  ............3>n.
        0x0030:  48a0 78c7                                H.x.
irq0 commented 1 year ago

The request data doesn't have anything interesting except the usual proxy headers. From those RGW does look at X-Forwarded-Proto, which isn't relevant in our case.

Toying around with the requests in mitmproxy lead me to something. The standard reply to create bucket is an empty one with status 200.

This made me change the assumption that the empty response is bad. Time to look more closely at the JSON response. It is quite peculiar that 'Content-Type: text/plain; charset=utf-8' and JSON. It suggests a special code path.

The logs let us narrow this down a bit from the like with 'op=27RGWCreateBucket_ObjStore_S3'. It means we hit that operation, which likely created the response. Grepping for 'entry_point_object_ver' also helped.

Here's the code that generates the JSON:

void RGWCreateBucket_ObjStore_S3::send_response()
{
  if (op_ret == -ERR_BUCKET_EXISTS)
    op_ret = 0;
  if (op_ret)
    set_req_state_err(s, op_ret);
  dump_errno(s);
  end_header(s);

  if (op_ret < 0)
    return;

  if (s->system_request) {
    JSONFormatter f; /* use json formatter for system requests output */

    f.open_object_section("info");
    encode_json("entry_point_object_ver", ep_objv, &f);
    encode_json("object_ver", info.objv_tracker.read_version, &f);
    encode_json("bucket_info", info, &f);
    f.close_section();
    rgw_flush_formatter_and_reset(s, &f);
  }
}

What is a system request?

  /* Is the request made by an user marked as a system one?
   * Being system user means we also have the admin status. */

The default user we create has this flag set from env RGW_DEFAULT_USER_SYSTEM. It defaults to 0, but we set it to 1 in the helm charts.

Does your local test user happen to be a system user?

I can reproduce this by setting system in the users table:

sqlite> select user_id, system from users;
sqlite> UPDATE users set system=0 where user_id='testid';
sqlite> UPDATE users set system=1 where user_id='testid';
giubacc commented 1 year ago

Guys, I have a Déjà vu: https://github.com/aquarist-labs/ceph/pull/27 As I said at the beginning of this thread; the json response of this call is non-standard and it is returned only for system users. The S3 standard for create operation is an empty body.

irq0 commented 1 year ago

One more thing. The response doesn't set a content type.

HTTP/1.1 200 OK
Content-Length: 755
Date: Tue, 04 Apr 2023 09:50:43 GMT
Server: Ceph Object Gateway (reef)
Connection: Keep-Alive

{"entry_point_object_ver":{"tag":"","ver":0},"object_ver":{"tag":"","ver":0},"bucket_info":{"bucket":{"name":"testbucketasdf","marker":"testbucketasdf.168060184376149816","bucket_id":"testbucketasdf.168060184376149816","tenant":"","explicit_placement":{"data_pool":"","data_extra_pool":"","index_pool":""}},"creation_time":"2023-04-04T09:50:43.076149Z","owner":"testid","flags":0,"zonegroup":"","placement_rule":"default","has_instance_obj":false,"quota":{"enabled":false,"check_on_raw":false,"max_size":-1,"max_size_kb":0,"max_objects":-1},"num_shards":1,"bi_shard_hash_type":0,"requester_pays":false,"has_website":false,"swift_versioning":false,"swift_ver_location":"","index_type":0,"mdsearch_config":[],"reshard_status":0,"new_bucket_instance_id":""}}

Can we get the frontend s3 library to ignore response bodys that doesn't have a content type?

For reference, a S3 response has application/xml:

HTTP/1.1 409 Conflict
Content-Length: 141
Accept-Ranges: bytes
Content-Type: application/xml
Date: Tue, 04 Apr 2023 09:45:58 GMT
Server: Ceph Object Gateway (reef)
Connection: Keep-Alive

<?xml version="1.0" encoding="UTF-8"?><Error><Code>BucketAlreadyExists</Code><BucketName>testbucketasdf</BucketName><HostId></HostId></Error>
giubacc commented 1 year ago

S3 responses in the standard are XML, but radosgw can, when you put format=json in the query string, return a json instead (not standard obviously).

irq0 commented 1 year ago

S3 responses in the standard are XML, but radosgw can, when you put format=json in the query string, return a json instead (not standard obviously).

Yes. That would, however, set the content type header to application/json unlike the system user responses

votdev commented 1 year ago

Can we get the frontend s3 library to ignore response bodys that doesn't have a content type?

I have nothing found to tell the AWS SDK to ignore such response.

giubacc commented 1 year ago

If the client cannot ignore those, we could implement a parameter in the server telling to shut-up system responses...

votdev commented 1 year ago

With https://github.com/aquarist-labs/s3gw-charts/commit/0d23f792 the use of the s3gw-config ConfigMap in the s3gw deployment has been removed. I don't know if this was by intention because the commit message does not mention that. This results in a knowledge gap that we do not set RGW_DEFAULT_USER_SYSTEM=1 in the s3gw container anymore. I'm still running my local dev containers with podman run --replace --name=s3gw -it -p 7480:7480 -e RGW_DEFAULT_USER_SYSTEM=1 quay.io/s3gw/s3gw:latest --rgw-backend-store sfs --debug-rgw 15.

I'm now testing the UI if it is still fully functional without test user being a system user. Due the fact that i'm mostly using the DO installation for UI development and i didn't see any regressions except this issue i assume we are fine to continue without a system user flag for the default test user.

votdev commented 1 year ago

The issue is closed as the root cause has been found and can be fixed by running the s3gw container without RGW_DEFAULT_USER_SYSTEM=1, which will make the s3gw S3 API behave as expected.

giubacc commented 1 year ago

With aquarist-labs/s3gw-charts@0d23f792 the use of the s3gw-config ConfigMap in the s3gw deployment has been removed. I don't know if this was by intention because the commit message does not mention that. This results in a knowledge gap that we do not set RGW_DEFAULT_USER_SYSTEM=1 in the s3gw container anymore. I'm still running my local dev containers with podman run --replace --name=s3gw -it -p 7480:7480 -e RGW_DEFAULT_USER_SYSTEM=1 quay.io/s3gw/s3gw:latest --rgw-backend-store sfs --debug-rgw 15.

I'm now testing the UI if it is still fully functional without test user being a system user. Due the fact that i'm mostly using the DO installation for UI development and i didn't see any regressions except this issue i assume we are fine to continue without a system user flag for the default test user.

Correct, that PR has had the unintended side effect to also remove that env-var from the s3gw pod. I guess we are preferring the new mode because the user does not need to be really a system, correct?

votdev commented 1 year ago

I guess we are preferring the new mode because the user does not need to be really a system, correct?

Yes, I currently assume this based on my tests with the UI. That's also the reason why i closed this issue and opened https://github.com/aquarist-labs/s3gw/issues/426.