kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
289 stars 253 forks source link

Cannot create cluster by Application Credentials without role admin #2131

Open nguyenhuukhoi opened 3 months ago

nguyenhuukhoi commented 3 months ago

/kind bug

What steps did you take and what happened:

Creating cluster by Application Credentials without admin role will cause create router and network forever util exceed quota. But it is ok when using password method

What did you expect to happen:

Create cluster properly by Application Credentials without role admin role

Anything else you would like to add:

Reconciler error err=< failed to reconcile network: Expected HTTP response code [201 202] when accessing [POST https://x.x.net:9696/v2.0/networks], but got 409 instead {"NeutronError": {"type": "OverQuota", "message": "Quota exceeded for resources: ['network'].", "detail": ""}} controller="openstackcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackCluster" OpenStackCluster="default/capi-quickstartdck" namespace="default" name="capi-quickstartdck" reconcileID="24ecea61-905e-40e5-8266-6cc0b4d95918"

Environment:

nguyenhuukhoi commented 3 months ago

I have updates: if I use Application Credentials with load-balancer_member, member, admin(or reader), it is ok. but Application Credentials with load-balancer_member, member, it is not ok.

mdbooth commented 3 months ago

@nguyenhuukhoi From the error you've pasted, the problem is that you're over quota. This is presumably why admin can do this, because quotas don't apply to admin.

I'm going to close this because it looks like it's working as intended. I think you need to increase your networks quota.

nguyenhuukhoi commented 3 months ago

Hello. if i have 10 network, it will take all and 100 network, it is same.

nguyenhuukhoi commented 3 months ago

It is ok. I get what you mean. Dont create cluster with admin role? Pls correct me.

mdbooth commented 3 months ago

Hello. if i have 10 network, it will take all and 100 network, it is same.

Can you paste some logs from the first network creation failure? The one you posted is just because it's out of quota. If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it, that would be a bug.

nguyenhuukhoi commented 3 months ago

"If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it". Yes, that what i mean. I will collect and post as you say.

yankcrime commented 2 months ago

The bug (CAPO retrying network / subnet / router creation until you hit quota limits) might be trigged by a number of things, but specifically I've hit this recently and it was caused by changes to Neutron RBAC policies.

The original error which causes CAPO to get stuck in a reconciliation loop until resources are exhausted in my case was:

  "err": "failed to reconcile router: unable to create router interface: Resource not found: [PUT https://xxx.xxx:9696/v2.0/routers/cf143c1c-96c9-4467-b61b-d5e9be704163/add_router_interface], error message: {\"NeutronError\": {\"type\": \"
HTTPNotFound\", \"message\": \"The resource could not be found.\", \"detail\": \"\"}}"

The root cause was related to how the application credential had been created and new Neutron API RBAC policies that were introduced and made the default as of 2023.2:

https://docs.openstack.org/releasenotes/neutron/2023.2.html#upgrade-notes

From the Neutron side, you'll see something like this corresponding with the CAPO router creation request:

2024-07-23 13:29:01.713 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Enforcing rules: ['get_router'] log_rule_list /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:457
2024-07-23 13:29:01.714 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Failed policy enforce for 'get_router' enforce /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:530                                                                  
2024-07-23 13:29:01.714 26 INFO neutron.api.v2.resource [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14 b8ac337152470eae38c67237eb59be] add_router_interface failed (client error): The resource could not be found.                                                                                                    
2024-07-23 13:29:01.715 26 INFO neutron.wsgi [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] 10.20.1.75,10.20.3.5 "PUT /v2.0/routers/26f86b86-5f5f-4e36-930d-877a075987b2/add_router_interface HTTP/1.1" status: 404  len: 285 time: 0.0809276

Updating the Neutron server configuration as recommended in the release notes solved the problem.