Closed ashangit closed 3 months ago
I am taking this up
Here's my understanding of what this means. You want to enable subdomain style s3 APIs in RGW.
However, how are you proposing to get the DNS to work in Kubernetes?
How are we going to have endpoints with bucketname.endpoint.com
inside kubernetes?
Am I misunderstanding the intent?
The intent is in fact to enable subdomain style s3 by at least having the appropriate ceph configuration of RGW being done by rook. Then it will in fact required a specific DNS entry to be created, one like *.endpoint.com pointing to the RGW/LoadBalancer.
I'm looking for this feature as well. I'm trying to configure it using the additional ceph.conf configmap, but so far it hasn't worked.
Our setup is as follows:
Now I really would like the option to have URLs like https://bucket.files.example.com/someFile.ext. According to Ceph documentation, I need to set the rgw dns name = files.example.com
to the radosgw configuration in ceph.conf
. I can't find it.
It would be great to have this available natively in Rook, without me trying to override the config map with all potential problems involved.
UPDATE: Got it to work with the following:
apiVersion: v1
kind: ConfigMap
metadata:
name: rook-config-override
namespace: rook-ceph
data:
config: |
[global]
rgw dns name = files.example.com
This of course only works because we have only one object store and it's not the most user friendly way to achieve this, I'd say.
@bartlaarhoven you can configure it thanks to the ceph cli through the toolbox. You will have to do it for each rgw:
kubectl exec -ti $(kubectl get po|grep tools|awk '{print $1}') -- ceph config set client.
rgw_dns_name files.example.com;
id_rgw => is the id arg of each rgw command line
@ashangit Oh thanks, I was very close finding the correct CLI command then earlier. Okay. That works, deleted the configmap. But this is still exactly what's asked in this issue, right? Then I still agree that it would be very useful to be able to config this in the Object Store CRD.
Yes this is exactly what I'd like rook to automatically configured
Probably should be done through zone group hostnames -
https://docs.ceph.com/docs/luminous/radosgw/multisite/#set-a-zone-group
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Worth noting that https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ now opens with an announcement that the plan to deprecate paths has been postponed.
I am using s3cmd to try to create bucket policy, but find it connect to ceph in dns sub-domain stype. I deploy Ceph with rook operator, want to how to find the "id_rgw => is the id arg of each rgw command line"
Did I understand correctly that
kubectl get CephObjectStore --all-namespaces
So I got in the rook-ceph-tools container:
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
Set rgw_dns_name for my CephObjectStore:
ceph configset client.ceph.objectstore-2.a rgw_dns_name ceph-s3.domain-name.com;
Make sure the value is applied:
ceph config get client.ceph.objectstore-2.a
WHO MASK LEVEL OPTION VALUE RO
...
client.ceph.objectstore-2.a advanced rgw_dns_name ceph-s3.domain-name.com *
Set the pod's IP to the ceph-s3.domain-name.com hostname in the /etc/hosts on the K8s server for test. And see these answers:
curl test.ceph-s3.domain-name.com:8080
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
curl test.ceph-s3.domain-name.com:8080/test
<?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>test</Name><Prefix></Prefix><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>file.jpg</Key><LastModified>2022-08-22T11:58:44.665Z</LastModified><ETag>"018e55e54f5ff95ce095b91ec15b8b6e"</ETag><Size>40532</Size><StorageClass>STANDARD</StorageClass><Owner><ID>ceph-objectstore-2-2-user</ID><DisplayName>ceph-objectstore-2-2-user</DisplayName></Owner><Type>Normal</Type></Contents><Marker></Marker></ListBucketResult>
There was a suggestion that in the first case I should see the bucket listing, and not in the second, if the subdomain DNS style is enabled. Please advise what am I doing wrong?
Also, this example:
curl test.ceph-s3.domain-name.com:8080/file.jpg
<?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><BucketName>file.jpg</BucketName><RequestId>tx00000746aaaa33c28bd6b-00630371fe-43785-ceph-objectstore-2</RequestId><HostId>43785-ceph-objectstore-2-ceph-objectstore-2</HostId></Error>
Error in the rook-ceph-rgw-ceph-objectstore-2 pod logs:
debug 2022-08-22T12:09:33.702+0000 7f4657dac700 1 ====== starting new request req=0x7f47880ea650 =====
debug 2022-08-22T12:09:33.702+0000 7f4657dac700 1 req 10719001441878942354 0.000000000s op->ERRORHANDLER: err_no=-2002 new_err_no=-2002
debug 2022-08-22T12:09:33.702+0000 7f4657dac700 1 ====== req done req=0x7f47880ea650 op status=0 http_status=404 latency=0.000000000s ======
debug 2022-08-22T12:09:33.702+0000 7f4657dac700 1 beast: 0x7f47880ea650: 10.233.64.116 - anonymous [22/Aug/2022:12:09:33.702 +0000] "GET /file.jpg HTTP/1.1" 404 253 - "curl/7.61.1" - latency=0.000000000s
Path style:
curl test.ceph-s3.domain-name.com:8080/test/file.jpg
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
Logs:
debug 2022-08-22T12:13:42.429+0000 7f472df58700 1 ====== starting new request req=0x7f478816b650 =====
debug 2022-08-22T12:13:42.433+0000 7f472df58700 1 ====== req done req=0x7f478816b650 op status=0 http_status=200 latency=0.004000189s ======
debug 2022-08-22T12:13:42.433+0000 7f472df58700 1 beast: 0x7f478816b650: 10.233.64.116 - anonymous [22/Aug/2022:12:13:42.428 +0000] "GET /test/file.jpg HTTP/1.1" 200 40532 - "curl/7.61.1" - latency=0.004000189s
Another workaround suggested in Slack channel. It works in my cluster! We need to modify zonegroup in the toolbox container.
[rook@rook-ceph-tools]$ radosgw-admin zonegroup get > zonegroup.json
[rook@rook-ceph-tools]$ vi zonegroup.json
...
"hostnames": ["SERVICE_NAME.rook-ceph.svc","YOUR DOMAIN FOR SUB DOMAIN STYLE"],
"hostnames_s3website": ["..."],
...
[rook@rook-ceph-tools-6c656456bd-qjkdv tmp]$ radosgw-admin zonegroup set --infile zonegroup.json
[rook@rook-ceph-tools-6c656456bd-qjkdv tmp]$ radosgw-admin period update --commit
@bartlaarhoven that gives me an error on the latest ceph, namely:
global_init: error reading config file. parse error: expected '<empty_line>' in line 2 at position 50
Have you figured out what the new syntax is supposed to be for that?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
@travisn @crombus A possible solution seems to be adding the ability to set the domain name or domain names for each CephObjectStore (RGW) instance, so that later the Rook add this in the config as described in this comment.
@thotz thoughts on this one?
@thotz @travisn what do you think of this solution? https://github.com/rook/rook/issues/4780#issuecomment-1400279624 Virtual-hosted-style URLs (DNS subdomain style) for S3 buckets is an important feature for S3 object storage. Please take a look.
@hohoqq So you're suggesting Rook could do something like this?
subdomains
setting on the CephZoneGroup CR@travisn As far as I understand, Rook creates and configures the Zonegroup for each CephObjectStore (RGW). Perhaps it is possible to add a domain and service names to "hostnames" in Zonegroup settings? And the domain name value
can be set in the CephObjectStore CR.
@thotz @BlaineEXE thoughts?
@hohoqq did u enable both rgw_dns_name
and define in the zonegroup hostname list??
at least https://docs.ceph.com/en/quincy/radosgw/multisite/#setting-a-zonegroup mentions rgw_dns_name
will be added automatically to hostname list, but u need to restart the RGW server?
If we are going to define hostname in zonegroup CR, then all the hostnames trying to access the rgw server need to list it there AFAIK. We can always add the internal rgw service endpoint to hostname list of zonegroup. But say we define openshift-route using the rgw-service-endpoint(same goes with loadbalancer). Then rgw endpoint won't be accessible unless the route hostname is defined in zonegroup hostname list.
If we have option Zonegroup level, users using simple rgw server configuration won't able to use this feature right?
did u enable both
rgw_dns_name
and define in the zonegroup hostname list?? at least https://docs.ceph.com/en/quincy/radosgw/multisite/#setting-a-zonegroup mentionsrgw_dns_name
will be added automatically to hostname list, but u need to restart the RGW server?
No, I defined the domain name in the Zonegroup hostname list as mentioned in this comment https://github.com/rook/rook/issues/4780#issuecomment-1230492811
Now I tried setting rgw_dns_name
on the RGW instance and that works too. This can be done separately for each ObjectStore
this way:
ceph config set client.rgw.ceph.objectstore rgw_dns_name <domain-name>
Both of these options need a restart of RGW pod for changes take effect.
If we are going to define hostname in zonegroup CR, then all the hostnames trying to access the rgw server need to list it there AFAIK. We can always add the internal rgw service endpoint to hostname list of zonegroup. But say we define openshift-route using the rgw-service-endpoint(same goes with loadbalancer). Then rgw endpoint won't be accessible unless the route hostname is defined in zonegroup hostname list.
If configuration is done with the ceph config
command, then the list of Zonegroup hostnames is left blank.
If we have option Zonegroup level, users using simple rgw server configuration won't able to use this feature right?
I did not set up the multizone that time. So I have only automatically created Zonegroups. By default, they are created with names equal to the names of the ObjectStores. If I understood the question correctly.
P.S. Setting rgw_dns_name
can be done through the Ceph API. But "can_update_at_runtime": false
option does not allow to do it through API. Maybe someone knows how to get around this?
Now I tried setting
rgw_dns_name
on the RGW instance and that works too. This can be done separately for each ObjectStore this way:ceph config set client.rgw.ceph.objectstore rgw_dns_name <domain-name>
After some testing, I realized that this option does not work. When I set rgw_dns_name
with ceph config
, the Rook operator pod couldn't connect to RGW and couldn't control it. Example before and after setup:
[rook@rook-ceph-operator-6c5bfc7d9f-mdx2n /]$ curl http://rook-ceph-rgw-ceph-objectstore-2.rook-ceph.svc:80
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
[rook@rook-ceph-operator-6c5bfc7d9f-mdx2n /]$ curl http://rook-ceph-rgw-ceph-objectstore-2.rook-ceph.svc:80
<?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><BucketName>rook-ceph-rgw-ceph-objectstore-2.rook-ceph.svc</BucketName><RequestId>tx00000873dbe3db034cc8a-006425a83b-b587-ceph-objectstore-2</RequestId><HostId>b587-ceph-objectstore-2-ceph-objectstore-2</HostId></Error>
For example, the Rook operator cannot create a user. Cannot perform any operation on RGW.
So the ceph config
setting does not work. If anyone know how to fix this issue please tell.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
I can suggest such a possible solution.
When using multisite and placing many RGW (CephObjectStores) in one zone, the setup is as follows:
[rook@rook-ceph-tools]$ radosgw-admin zonegroup get --rgw-zone=<zone-name> > zonegroup.json
[rook@rook-ceph-tools]$ vi zonegroup.json
...
"hostnames": ["SERVICE_NAME.rook-ceph.svc","DOMAIN-NAME.COM",
"SERVICE_NAME-2.rook-ceph.svc","DOMAIN-NAME-2.COM",
...
"SERVICE_NAME-n.rook-ceph.svc","DOMAIN-NAME-N.COM"],
"hostnames_s3website": [ ],
...
[rook@rook-ceph-tools]$ radosgw-admin zonegroup set --rgw-zone=<zone-name> --infile zonegroup.json
[rook@rook-ceph-tools]$ REALM_ID=$(cat zonegroup.json | jq -r ".realm_id")
[rook@rook-ceph-tools]$ radosgw-admin period update --commit --realm-id="${REALM_ID}"
Then restart all RGWs in that zone. Also these settings can be made before creating the RGW. The zone for placing the RGW is configured in its manifest. The ability to specify a domain name can be added in it too.
Without using multisite and specifying a zone for RGW (CephObjectStore) - the zone has the same name as the RGW (CephObjectStore). The rest of the setting is the same as above. In hostnames, we need to specify the name of the service and the domain name for each RGW (CephObjectStore).
Tested these solutions.
@travisn @thotz Is it possible to remove the wontfix label and put this feature request in the To Do list?
Removed keepalive so the stale bot will remind us to reprioritize work on this periodically
s3cmd setcors cors.xml s3://$BUCKET_NAME return ERROR: S3 error: 405 (MethodNotAllowed). It takes days to find out the problem is this issue. Any progress to implement submain config in rook?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
@travisn where does this stand?
@thotz Can you take a look at this, or shall we find a new assignee? thanks
If I understand correctly, there are two options that need to be created
1.) For normal object store, the rgw_dns_name
option in the CephObjectStore CR
2.) for multisite the option need to set on zonegroup in the hostname field
One drawback of setting this option, the service endpoint or route which we create no longer valid. Only rgw_dns_name will
$ curl minikube:31347
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>bash-4.4$
bash-4.4$ curl http://rook-ceph-rgw-my-store.rook-ceph.svc:80
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult
$ ceph config set client.rgw.my.store.a rgw_dns_name minikube
bash-4.4$ ceph config get client.rgw.my.store.a rgw_dns_name
minikube
bash-4.4$ curl http://rook-ceph-rgw-my-store.rook-ceph.svc:80
curl: (7) Failed to connect to rook-ceph-rgw-my-store.rook-ceph.svc port 80: Connection refused
bash-4.4$ curl minikube:31347
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>bash-4.4$
So TLS certs-related issues and all pop-up mismatching hostnames will all come up.
Given the complexity Jiffin explained, I added the needs-design-document label. We will need to spend effort to understand how this affects existing multisite, TLS, and endpoint behaviors. We have been making concerted efforts to ensure feature stability and enterprise-readiness of all three of those points in the past couple years, so we should not take this feature lightly as far as side effects may go. I think this will take quite some work to realize.
The older rgw_dns_name
property only supports a single FQDN, so if you wanted to provide virtual host style support via both a route/ingress FQDN and via some other FQDN (maybe if k8s were to add support for service wildcards), then you'd need to set a list of two hostnames in the zonegroup configuration. The SSL certificate would need 1 common name, and 3 subject alternate names (two being wildcard subdomains).
The RGWs need the list of hostnames so they can discern whether a request is virtual host style or path style.
A request with
GET /foo/bar
host: baz.example.com
Could be interpreted as s3://foo/bar
or s3://baz/foo/bar
depending on if the hostname is baz.example.com
or example.com
.
Sure so I just need to focus on hostnames in the zonegroup config
Yes, at least as it pertains to rgw side configuration. I see multiple comments with references to rook-ceph-rgw-my-store.rook-ceph.svc
, and generally speaking, kubernetes doesn't support wildcards for services [1]. You can however, do this for an ingress [2].
[1] https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ [2] https://kubernetes.io/docs/concepts/services-networking/ingress/#hostname-wildcards
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
It would still be great to have some support within rook or the helm chart. Please don't close.
Yes! #13022 and #13326 are in progress, the bot just isn't smart enough to notice.
Oh, that's great. I did not notice the two PRs. Thank you for pointing at them.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Is this a bug report or feature request?
What should the feature do: Currently only the bucket Path style is supported. The feature should provide capability to configure CephObjectStorage for S3 buckets as DNS subdomain style (same feature than https://github.com/rook/rook/pull/2685 but for ceph) To do so rados gateways should be started with
--rgw-dns-name
parameters configured to the dns endpointWhat is use case behind this feature: As Path style access seems to be deprecated/soon to be removed by AWS (https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/). It seems interesting to enable this feature
Environment: