Open mauhau opened 1 year ago
Hi @mauhau,
thank you for sharing your experience and showing the weak points of the documentation.
I will add there some more docs to show which settings depends on the openstack installation.
You are right in the default helm values for yawol
a VerticalPodAutoscaler
is needed. You can disable VPA in the Helm values with vpa.enabled: false
.
I think we should disable it by default or check somehow if VPA is available in the k8s cluster (what do you think @breuerfelix @einfachnuralex).
You are right tenant-name
is not used in openstack. I looked it up and currently it is needed to set it to the ProjectName
(so for now please set the tenant-name
to the name of your openstack project). Also the project-name
which is shown in the docs, is not used. I will "fix" that and also add more info in the docs about it.
@breuerfelix should we build also latest
or pin fixed image in the helm values? I think pinned images makes more sense because latest would result in an update on the cluster without the crds getting updated.
The example service is used for the development environment (docs). This setting is needed to be able to run multible yawol-cloud-controllers
. In the default every k8s service of type LoadBalancer
without that is is used by yawol.
Your last error looks like a error with the login into openstack. I think the domain-name
is missing in your secret.
I hope this helps you to get it running 😃
Great, thank you.
For the documentation of the secret I would propose something like this:
apiVersion: v1
kind: Secret
metadata:
name: cloud-provider-config
type: Opaque
stringData:
cloudprovider.conf: |-
[Global]
auth-url="<OS_AUTH_URL>"
domain-name="<OS_USER_DOMAIN_NAME>"
tenant-name="<OS_PROJECT_NAME>"
project-name="<OS_PROJECT_NAME>"
username="<OS_USERNAME>"
password="<OS_PASSWORD>"
region="<OS_REGION_NAME>"
After setting the secret correctly yawol creates SecGroup, FloatingIP and Port, but no Server:
1.6744837509860816e+09 INFO controller.LoadBalancer Reconcile Openstack {"lb": "yawol-test--lb-test3"}
1.6744837509861178e+09 INFO controller.LoadBalancer Reconcile SecGroup {"lb": "yawol-test--lb-test3"}
1.6744837514294589e+09 INFO controller.LoadBalancer Create SecGroup {"lb": "yawol-test--lb-test3"}
1.6744837517297585e+09 INFO controller.LoadBalancer Reconcile SecGroupRules {"lb": "yawol-test--lb-test3"}
1.6744837517299604e+09 DEBUG events Warning {"object": {"kind":"LoadBalancer","namespace":"kube-system","name":"yawol-test--lb-test3","uid":"d3f44e80-0747-4eef-98dd-d0f66e23fcd5","apiVersion":"yawol.stackit.cloud/v1beta1","resourceVersion":"1016080"}, "reason": "Warning", "message": "DebugSettings are enabled, Port 22 is open to all IP ranges."}
1.6744837533923779e+09 INFO controller.LoadBalancer Reconcile FloatingIP {"lb": "yawol-test--lb-test3"}
1.6744837534356053e+09 INFO controller.LoadBalancer Create FloatingIP {"lb": "yawol-test--lb-test3"}
1.6744837538124757e+09 INFO controller.LoadBalancer Update ExternalIP {"lb": "yawol-test--lb-test3"}
1.6744837538338878e+09 INFO controller.LoadBalancer Reconcile Port {"lb": "yawol-test--lb-test3"}
1.6744837539900713e+09 INFO controller.LoadBalancer Create Port {"lb": "yawol-test--lb-test3"}
1.6744837544405751e+09 INFO controller.LoadBalancer successfully created port {"id": "5a673b2a-6225-453e-bd73-c798f1f36d88", "lb": "kube-system/yawol-test--lb-test3"}
1.6744837549955077e+09 INFO controller.LoadBalancer Reconcile FIPAssociate {"lb": "yawol-test--lb-test3"}
1.674483755012449e+09 INFO controller.LoadBalancer Bind FloatingIP to Port {"lb": "yawol-test--lb-test3"}
1.6744837557283204e+09 INFO controller.LoadBalancer Reconcile Openstack {"lb": "yawol-test--lb-test3"}
1.6744837557283442e+09 INFO controller.LoadBalancer Reconcile SecGroup {"lb": "yawol-test--lb-test3"}
1.6744837559698987e+09 INFO controller.LoadBalancer Reconcile SecGroupRules {"lb": "yawol-test--lb-test3"}
1.6744837559700172e+09 DEBUG events Warning {"object": {"kind":"LoadBalancer","namespace":"kube-system","name":"yawol-test--lb-test3","uid":"d3f44e80-0747-4eef-98dd-d0f66e23fcd5","apiVersion":"yawol.stackit.cloud/v1beta1","resourceVersion":"1016099"}, "reason": "Warning", "message": "DebugSettings are enabled, Port 22 is open to all IP ranges."}
1.6744837564755807e+09 INFO controller.LoadBalancer Reconcile FloatingIP {"lb": "yawol-test--lb-test3"}
1.6744837564999213e+09 INFO controller.LoadBalancer Reconcile Port {"lb": "yawol-test--lb-test3"}
1.674483756517218e+09 INFO controller.LoadBalancer Reconcile FIPAssociate {"lb": "yawol-test--lb-test3"}
1.6744839279968724e+09 INFO controller.LoadBalancer LoadBalancer not found {"lb": "kube-system/yawol-test--lb-test"}
1.6744840405738192e+09 INFO controller.LoadBalancer LoadBalancer not found {"lb": "kube-system/yawol-test--lb-test2"}
Any hints how to debug/fix this?
In the Log of the yawol-controller-loadbalancermachine
I can find this error:
1.6744890245824962e+09 INFO controller.LoadBalancerMachine Check SA {"loadBalancerMachineName": "default--lb-test-zizxkwex25rbtquw-cdbe6"}
1.674489024582533e+09 INFO controller.LoadBalancerMachine Check role {"loadBalancerMachineName": "default--lb-test-zizxkwex25rbtquw-cdbe6"}
1.6744890245825515e+09 INFO controller.LoadBalancerMachine Check rolebinding {"loadBalancerMachineName": "default--lb-test-zizxkwex25rbtquw-cdbe6"}
1.6744890250379155e+09 ERROR Reconciler error {"controller": "loadbalancermachine", "controllerGroup": "yawol.stackit.cloud", "controllerKind": "LoadBalancerMachine", "loadBalancerMachine": {"name":"default--lb-test-zizxkwex25rbtquw-cdbe6","namespace":"kube-system"}, "namespace": "kube-system", "name": "default--lb-test-zizxkwex25rbtquw-cdbe6", "reconcileID": "5d85e00a-3f04-4ee5-ba88-504a1b268c0c", "error": "secret not found for serviceAccount default--lb-test-zizxkwex25rbtquw-cdbe6"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234
Yes, I know the service name changed :smile:
I think we found the problem.
We are using kubernetes 1.25.5. After manually creating the secret for the service account reconciled by yawol-controller following this documentation and adding the secret to the service account we finally got a working loadbalancer machine :space_invader:
kubectl -n kube-system apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: default--lb-test-secret
annotations:
kubernetes.io/service-account.name: default--loadbalancer-rfeqiyrg7dcknhf3-1b890
type: kubernetes.io/service-account-token
EOF
root@garden-base-k8s-master-nf-1:~# kubectl -n kube-system get serviceaccounts default--loadbalancer-rfeqiyrg7dcknhf3-1b890 -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2023-01-23T16:50:12Z"
name: default--loadbalancer-rfeqiyrg7dcknhf3-1b890
namespace: kube-system
resourceVersion: "1041342"
uid: 64f00b42-32e4-42ae-885b-380a8edab571
secrets:
- name: default--lb-test-secret
So maybe you should mention which kubernetes versions are currently supported by yawol.
Yes that is correct currently it is only with 1.23 or less. We will add support for that soon :)
I created some issues to track that topics: #98 #99 #100 #101
Hi @mauhau,
thanks again for your feedback. We solved all your points in the version v0.12.0:
project-id
is now also supported in the secretWe would be happy if you could look over the changes and give us feedback again.
Hey @dergeberl,
thank you for the update! I will check that soon and will update the issue. Sounds great!
Another question not related to this issue: If we want to use the yawol loadbalancer for Gardener Shoot clusters, is there any documentation how to do that?
Hey @dergeberl,
thank you for the update! I will check that soon and will update the issue. Sounds great!
Another question not related to this issue: If we want to use the yawol loadbalancer for Gardener Shoot clusters, is there any documentation how to do that?
If you use gardener too then you would most likely write an extension for yawol. At least that is what we did, unfortunately that extension is not public yet.
I just tested the updates. It works almost perfect now.
The only problem which remained is the Secret cloud-provider-config
. There is the namespace kube-system
missing and it is not allowed to set as well domain-name and domain-id:
ERROR Reconciler error {"controller": "loadbalancer", "controllerGroup": "yawol.stackit.cloud", "controllerKind": "LoadBalancer", "loadBalancer": {"name":"default--nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "default--nginx", "reconcileID": "d78086e0-e213-483d-92d1-cb2d11c6c936", "error": "You must provide exactly one of DomainID or DomainName to authenticate by Username"}
Hence I would suggest it in the documentation like this:
apiVersion: v1
kind: Secret
metadata:
name: cloud-provider-config
namespace: kube-system
type: Opaque
stringData:
cloudprovider.conf: |-
[Global]
auth-url="<OS_AUTH_URL>"
domain-name="<OS_USER_DOMAIN_NAME>"
# specifiy either domain-name or domain-id (mutual exclusive)
# domain-id="<OS_DOMAIN_ID>"
# Deprecated (tenant-name): Please use project-name, only used if project-name is not set.
# tenant-name="<OS_PROJECT_NAME>"
project-name="<OS_PROJECT_NAME>"
project-id="<OS_PROJECT_ID>"
username="<OS_USERNAME>"
password="<OS_PASSWORD>"
region="<OS_REGION_NAME>"
Everything else is working. Great job :+1:
You could improve the documentation for build with a hint earthly +build-packer-environment...
will show required variables for Build yawol OpenStack Image.
And installation / use of Yawol could be easier if helm charts are available as tgz
artifact in the release as well as the ready OpenStack image. But you probably know that already.
And just found another one.
The CRD allows not the values for serverGroupPolicy
as documented:
2023-02-03T16:35:33.600761470Z 1.6754421336006477e+09 ERROR Reconciler error {"controller": "loadbalancer", "controllerGroup": "yawol.stackit.cloud", "controllerKind": "LoadBalancer", "loadBalancer": {"name":"default--nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "default--nginx", "reconcileID": "f5e3dac9-c35b-4137-b979-e182837ca9f0", "error": "Bad request with: [POST https://***:32443/v2.1/os-server-groups], error message: {\"badRequest\": {\"code\": 400, \"message\": \"Invalid input for field/attribute 0. Value: soft-anti-affinity. 'soft-anti-affinity' is not one of ['anti-affinity', 'affinity']\"}}"
I just tested the updates. It works almost perfect now.
The only problem which remained is the
Secret cloud-provider-config
. There is the namespacekube-system
missing and it is not allowed to set as well domain-name and domain-id:ERROR Reconciler error {"controller": "loadbalancer", "controllerGroup": "yawol.stackit.cloud", "controllerKind": "LoadBalancer", "loadBalancer": {"name":"default--nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "default--nginx", "reconcileID": "d78086e0-e213-483d-92d1-cb2d11c6c936", "error": "You must provide exactly one of DomainID or DomainName to authenticate by Username"}
Hence I would suggest it in the documentation like this:
apiVersion: v1 kind: Secret metadata: name: cloud-provider-config namespace: kube-system type: Opaque stringData: cloudprovider.conf: |- [Global] auth-url="<OS_AUTH_URL>" domain-name="<OS_USER_DOMAIN_NAME>" # specifiy either domain-name or domain-id (mutual exclusive) # domain-id="<OS_DOMAIN_ID>" # Deprecated (tenant-name): Please use project-name, only used if project-name is not set. # tenant-name="<OS_PROJECT_NAME>" project-name="<OS_PROJECT_NAME>" project-id="<OS_PROJECT_ID>" username="<OS_USERNAME>" password="<OS_PASSWORD>" region="<OS_REGION_NAME>"
Everything else is working. Great job 👍
This is already documented. Just above the example :)
Note: At most one of domain-id or domain-name and project-id or project-name must be provided.
And just found another one.
The CRD allows not the values for
serverGroupPolicy
as documented:2023-02-03T16:35:33.600761470Z 1.6754421336006477e+09 ERROR Reconciler error {"controller": "loadbalancer", "controllerGroup": "yawol.stackit.cloud", "controllerKind": "LoadBalancer", "loadBalancer": {"name":"default--nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "default--nginx", "reconcileID": "f5e3dac9-c35b-4137-b979-e182837ca9f0", "error": "Bad request with: [POST https://***:32443/v2.1/os-server-groups], error message: {\"badRequest\": {\"code\": 400, \"message\": \"Invalid input for field/attribute 0. Value: soft-anti-affinity. 'soft-anti-affinity' is not one of ['anti-affinity', 'affinity']\"}}"
The error is coming from your openstack API. It seems like your environment doesn't support it.
Ah. Now I see the Note above the example :smile:
The other point is weird. My OpenStack is fine with soft-anti-affinity
:
(openstack) $ openstack server group list
+--------------------------------------+-------------------+--------------------+
| ID | Name | Policy |
+--------------------------------------+-------------------+--------------------+
| fa8aa365-0070-4e1e-a9dd-501933653b63 | k8s-node-srvgrp | soft-anti-affinity |
| d9177be6-6718-4516-8b7c-a65f0d5e03e8 | k8s-master-srvgrp | soft-anti-affinity |
+--------------------------------------+-------------------+--------------------+
But I also see in the log it was returned by OpenStack API.
Ah. Now I see the Note above the example 😄
The other point is weird. My OpenStack is fine with
soft-anti-affinity
:(openstack) $ openstack server group list +--------------------------------------+-------------------+--------------------+ | ID | Name | Policy | +--------------------------------------+-------------------+--------------------+ | fa8aa365-0070-4e1e-a9dd-501933653b63 | k8s-node-srvgrp | soft-anti-affinity | | d9177be6-6718-4516-8b7c-a65f0d5e03e8 | k8s-master-srvgrp | soft-anti-affinity | +--------------------------------------+-------------------+--------------------+
But I also see in the log it was returned by OpenStack API.
If I had to guess I would probably say that you are hitting an old API. Sometimes there a multiple API versions for different openstack identity providers. But I'm not an openstack expert 🤷, sorry.
Good point. Nova API of my OpenStack Yoga returns version 2.90
and according to the OpenStack API documentation this should be fine. Stays weird :confused:
I tried to follow the installation as described in the README and failed at several points.
Image Build
First I failed at the
terraform apply
for the build infrastructure and hat to adopt the name of the floating ip network:And set a DNS server for the created network:
Next fail was at the earthy build. To make that work I had to adopt the flavor type and volume type:
After that I got the
yawol-alpine-v0.11.0-1
image.Cluster Installation
At the cluster installation it would be nice to have a link how to install
VerticalPodAutoscaler
(which is not that hard to find, but would make life easier :smile_cat: ).I had to create the
Secret
in namespacekube-system
otherwise the controller did not find it.In the
Secret
there is sometenant-name
which is no (more) used in OpenStack. So it's not clear was needs to be provided here.The images referenced in the helm chart do not exist, I adopted the
values.yaml
to make that work:The provided example service is not working out-of-the-box, because of the set
yawol.stackit.cloud/className: "test"
. After commenting this out, the service gets handled by the yawol controller.Finally I have to give up with this error:
Any hints are welcome as I would like to see it running :space_invader: