Open rumart opened 8 months ago
Howdy 🖐 rumart ! Thank you for your interest in this project. We value your feedback and will respond soon.
Here's a screenshot of kubectl get pods -A
before re-running the setup file
Hi @rumart, the VEBA_BOM_FILE
variable is already set in setup-04-kubernetes.sh
for the first time - HERE.
I can see on your screenshot that the installation didn't finish successfully. The vmware-sources
ns is e.g. missing. We've faced this issue before and actually, it should be fixed with #1170.
We have to dig into it.
Yeah, so when I comment out setup-04 it doesn't pick up on the BOM variable, but nevertheless, since it get's defined in setup-05 could it just be moved up a bit? Or should it be removed altogether?
Thanks for looking into it
I don't think that the issue is caused by not setting the VEBA_BOM_FILE
variable. We have the suspicion that it is timing-related. Have you tried deploying it again? To what kind of environment are you deploying VEBA to?
I agree, the VEBA_BOM_FILE issue is because I've re-run the script without running the setup-04 which sets it the first time. Was more thinking of fixing that setup-05 file separately..
Anyways, I'm running it on a small home lab vSAN cluster. Have tried redeploying a few times, all stopping on the same error message.
I'll try to run it on a different env later tonight to see if that changes anything
I've tried on a single ESXi host not running anything else, storage on NVME. I've added more CPU and RAM to the appliance. Still errors out on the same step
I ssh'd to the appliance as soon as it was available and tailed the bootstrap-debug.log. The error failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev"
happens after just a couple of minutes. As far as I understand there's a 10 minute timeout on most of the commands?
IIRC, the 10 minutes are the default for the kubectl wait
command if you don't specify --timeout
separately. I really wonder about this issue. I deployed it in my homelab (2-node vSAN cluster) as well and it worked like a charm. Anyway, like I said, William had this issue before as well but reordering the command executions did the trick. When I have time, I'll try to add another wait
condition to the script(s)(if necessary!). Thanks @rumart
I suspect that the current "wait" conditions are actually passing, unless you login and it looks to be waiting for default 10m as mentioned by Robert. If it truly is a timing, we can always enhance the OVF properties to allow that to be customizable but I'm not sure if thats actually the case and we may need some other wait condition. If we can debug this further Robert, then we can spin up a custom build to verify for @rumart
Just as @rumart, first error I got:
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.109.26.244:443: connect: connection refused
Second try, I increased the timeout value and kept going.
Third try stumbled upon the following:
/root/setup/setup-05-knative.sh: line 44: VEBA_BOM_FILE: unbound variable
Which had to work around to keep the installation going.
@rumart I owe you a deep apology for not getting back to you earlier. Would you be open to troubleshoot your issue further? I've just added another wait
condition to the setup-05-knative.sh
script and have built a new appliance (test)version. I'd love to follow the deployment in your test-environment. Maybe we could run a Zoom session?
What really helps to get started is the following approach:
tmux
ssh
to it - on both windows!tail -f /var/log/bootstrap-debug.log
on the one window and watch kubectl get pods -A
on the other windowFrom there you can perfectly follow the progress.
The new build can be downloaded for testing purposes here: DOWNLOAD
Thanks @rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running.
Thanks @rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running.
Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc.
Seems I cannot download the testversion..
On 13 Jun 2024, at 08:58, Robert Guske @.***> wrote:
Thanks @rguske https://github.com/rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running.
Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc.
— Reply to this email directly, view it on GitHub https://github.com/vmware-samples/vcenter-event-broker-appliance/issues/1176#issuecomment-2164711208, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIR6R7QM6CG4N3SMZCO7HLZHE7J5AVCNFSM6AAAAABJF4UKS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTCMRQHA. You are receiving this because you were mentioned.
Seems I cannot download the testversion.. … On 13 Jun 2024, at 08:58, Robert Guske @.***> wrote: Thanks @rguske https://github.com/rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running. Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc. — Reply to this email directly, view it on GitHub <#1176 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIR6R7QM6CG4N3SMZCO7HLZHE7J5AVCNFSM6AAAAABJF4UKS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTCMRQHA. You are receiving this because you were mentioned.
I've authorized you now 👍🏻
Just to add in, yesterday, we were on vCenter 7.0.3 and I was able to deploy. Today, after an update to vCenter 8.0.2, I get the same error as @rumart.
rabbitmqcluster.rabbitmq.com/veba-rabbit created
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.98.98.40:443: connect: connection refused
Thanks a lot for your input @benwa. I don't think this issue is related to the vSphere version, since the first "real" interaction with the vCenter Server is at line 22 in script 06. when the VSphereSource
gets created. It really seems to be a timing issue. I still try to find out which component probably needs a dedicated wait
condition.
Welp, I redownloaded the ova from the Flings site and ran a checksum. It was different. Redeployed and I'm all good now.
Eh… I still can’t deploy it. Even with a new test version provided by @rguskeOn 26 Jun 2024, at 18:29, William Lam @.***> wrote: Closed #1176 as completed.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Issue still exists.
@rumart I've now added a sleep 30
to setup-05-knative.sh
. I haven't found the problematic part yet. Could you give this version a try? DOWNLOAD.
Thy
Now I'm able to deploy successfully. Tested several times without issues
Now I'm able to deploy successfully. Tested several times without issues
Interesting! Thanks lot for verifying Rudi. However, I will try to narrow it down. There must be different way. We'd really appreciate if you'd be open to test further builds. Thy :)
First time VEBA user eager to get this working, but I'm also experencing this issue VEBA 0.8.0 vCenter 8.0.3
/var/log/bootstrap-debug.log
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.105.248.31:443: connect: connection refused
First time VEBA user eager to get this working, but I'm also experencing this issue
VEBA 0.8.0
vCenter 8.0.3
/var/log/bootstrap-debug.log
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.105.248.31:443: connect: connection refused
Thanks for reporting it. Could you please try the version provided in this comment HERE? Thy
Thanks for reporting it. Could you please try the version provided in this comment HERE? Thy
That link doesn't work anymore. Google Drive says:
Sorry, the file you have requested does not exist.
Make sure that you have the correct URL and the file exists.
I will provide a new link in a bit. I was on vacation and back on the issue now. The issue looks similar to what is described here: https://cert-manager.io/docs/troubleshooting/webhook/
So, it looks to me that the Kubernetes API server is trying to call the rabbitmq-broker-webhook
when we are installing the RabbitMQ cluster via kubectl apply -f ${RABBITMQ_CONFIG}
.
Even tough, the following is included in our script which should ensure that everything is in READY
state.
kubectl wait --for=condition=available deploy/rabbitmq-broker-webhook --timeout=${KUBECTL_WAIT} -n knative-eventing
@royiversen78 use this LINK temporarily.
I'm getting the same issue with this version
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.108.11.231:443: connect: connection refused
@rumart @royiversen78 we added a pause to the installation to ensure service dependencies and availabilities. Changes just got merged. https://github.com/vmware-samples/vcenter-event-broker-appliance/pull/1268
If you'd like to test its functionality, please DM me (preferred on CNCF Slack) and I will provide you a download link to the OVA. Thanks
Describe the bug The VEBA deployment doesn't finish and throws an error when deploying the RabbitMQ cluster
To Reproduce Steps to reproduce the behavior: I've deployed the OVA as described in the docs Waited for around 20 minutes, but none of the web endpoints work (Connection refused)
Expected behavior The deployment to finish and the endpoints to work
Screenshots Screenshot of bootstrap-debug.log
Version (please complete the following information):
Additional context When troubleshooting I saw that the deployment stopped in what seems to be setup-05-knative.sh script.
I commented out scripts 1 through 4 in setup.sh and reran setup.sh
After a short while the script stopped with this message:
Checked the setup-05-knative.sh script and found that the VEBA_BOM_FILE variable was defined after it being used in the file
The ytt command on line 44 uses $VEBA_BOM_FILE, but the variable is first defined on line 51.
I moved that line above line 44 and reran setup.sh
Now the deployment could finish and I can access the web endpoints