Open nguyenhuukhoi opened 4 months ago
I have updates: if I use Application Credentials with load-balancer_member, member, admin(or reader), it is ok. but Application Credentials with load-balancer_member, member, it is not ok.
@nguyenhuukhoi From the error you've pasted, the problem is that you're over quota. This is presumably why admin can do this, because quotas don't apply to admin.
I'm going to close this because it looks like it's working as intended. I think you need to increase your networks
quota.
Hello. if i have 10 network, it will take all and 100 network, it is same.
It is ok. I get what you mean. Dont create cluster with admin role? Pls correct me.
Hello. if i have 10 network, it will take all and 100 network, it is same.
Can you paste some logs from the first network creation failure? The one you posted is just because it's out of quota. If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it, that would be a bug.
"If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it". Yes, that what i mean. I will collect and post as you say.
The bug (CAPO retrying network / subnet / router creation until you hit quota limits) might be trigged by a number of things, but specifically I've hit this recently and it was caused by changes to Neutron RBAC policies.
The original error which causes CAPO to get stuck in a reconciliation loop until resources are exhausted in my case was:
"err": "failed to reconcile router: unable to create router interface: Resource not found: [PUT https://xxx.xxx:9696/v2.0/routers/cf143c1c-96c9-4467-b61b-d5e9be704163/add_router_interface], error message: {\"NeutronError\": {\"type\": \"
HTTPNotFound\", \"message\": \"The resource could not be found.\", \"detail\": \"\"}}"
The root cause was related to how the application credential had been created and new Neutron API RBAC policies that were introduced and made the default as of 2023.2:
https://docs.openstack.org/releasenotes/neutron/2023.2.html#upgrade-notes
From the Neutron side, you'll see something like this corresponding with the CAPO router creation request:
2024-07-23 13:29:01.713 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Enforcing rules: ['get_router'] log_rule_list /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:457
2024-07-23 13:29:01.714 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Failed policy enforce for 'get_router' enforce /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:530
2024-07-23 13:29:01.714 26 INFO neutron.api.v2.resource [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14 b8ac337152470eae38c67237eb59be] add_router_interface failed (client error): The resource could not be found.
2024-07-23 13:29:01.715 26 INFO neutron.wsgi [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] 10.20.1.75,10.20.3.5 "PUT /v2.0/routers/26f86b86-5f5f-4e36-930d-877a075987b2/add_router_interface HTTP/1.1" status: 404 len: 285 time: 0.0809276
Updating the Neutron server configuration as recommended in the release notes solved the problem.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/kind bug
What steps did you take and what happened:
Creating cluster by Application Credentials without admin role will cause create router and network forever util exceed quota. But it is ok when using password method
What did you expect to happen:
Create cluster properly by Application Credentials without role admin role
Anything else you would like to add:
Reconciler error err=< failed to reconcile network: Expected HTTP response code [201 202] when accessing [POST https://x.x.net:9696/v2.0/networks], but got 409 instead {"NeutronError": {"type": "OverQuota", "message": "Quota exceeded for resources: ['network'].", "detail": ""}} controller="openstackcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackCluster" OpenStackCluster="default/capi-quickstartdck" namespace="default" name="capi-quickstartdck" reconcileID="24ecea61-905e-40e5-8266-6cc0b4d95918"
Environment:
git rev-parse HEAD
if manually built): v0.10.3kubectl version
): 1.27.4/etc/os-release
): 22.04