syntasso / kratix-marketplace

Apache License 2.0
9 stars 8 forks source link

Keeping Dapr up to date with version 1.13.2 #6

Closed salaboy closed 3 months ago

salaboy commented 3 months ago

Updating Dapr version based on latest release.

abangser commented 3 months ago

Hey @salaboy thanks for the update!

I have run the fetch-deps script which actually utilises this change and pushed this as an additional commit to this PR.

I then attempted to build and use the new workflow image locally using the pipeline-image build load script. Note: Merging this PR will automatically build and push a new version of the image for public consumption, this is just for testing the PR before merge.

Unfortunately I ran into a crash loop back off on two components for Dapr:

  1. dapr-operator has the following errror:
    time="2024-04-09T09:48:52.649568222Z" level=fatal msg="error running operator: failed to wait for cache sync\nAPI server did not become ready in time: timeout waiting for api server to be ready\nfailed to wait for cache sync\nfailed to retrieve the initial identity certificate: error establishing connection to sentry: context canceled: connection error: desc = \"transport: Error while dialing: dial tcp: lookup dapr-sentry.default.svc.cluster.local on 10.43.0.10:53: no such host\"" instance=dapr-operator-747557fd66-h7sg5 scope=dapr.operator type=log ver=1.13.2
  2. dapr-sidecar-injector has the following error:
    time="2024-04-09T09:50:01.40640517Z" level=fatal msg="Error running injector: timed out waiting for injector to become ready\nfailed to retrieve the initial identity certificate: error establishing connection to sentry: context canceled: connection error: desc = \"transport: Error while dialing: dial tcp: lookup dapr-sentry.default.svc.cluster.local on 10.43.0.10:53: no such host\"" instance=dapr-sidecar-injector-7685fc4856-578mv scope=dapr.injector type=log ver=1.13.2

These appear certificate driven and I do have cert-manager installed, but I wonder if there is something else going on?

We have a few options here...

  1. We can merge this as clearly the installation of the new version is happening, and it may be user environment issues which is not specifically a public marketplace Promise concern.
  2. Before merging, you can take a look and see if there is something else we need to do in the workflow given the new version installation. Happy to pair with you on this if it helps!
salaboy commented 3 months ago

@abangser, that is interesting. Did this happen in a new cluster? or did it happen as part of an upgrade from a previous version? I think that if it is an upgrade, you need to make sure that the CRDs are replaced.

Can you give me more context about how and where (which Kubernetes cluster) did you used to run the installation? Was it a KinD cluster or a real Kubernetes Cluster? Did you have other things installed there (you mentioned cert-manager)? This makes me think it is not a brand-new cluster.

salaboy commented 3 months ago

@abangser oh.. I think I know what is happening.. the sentry shouldn't be installed in the default namespace.. why is that happening? All Dapr Components should be installed in the dapr-system namespace, that you usually create when you install the helm chart with the --create-namespace option.

abangser commented 3 months ago

Weird @salaboy . I see where you think it was deployed to default given the failure log. But it isn't in default, it is correctly in dapr-system:

image

Why would it be trying to connect in default? 🤔

salaboy commented 3 months ago

Can you share the steps that you used to see if I can reproduce the issue in a new kind cluster ?

That is odd

On Wed, 10 Apr 2024 at 04:36, abangser @.***> wrote:

Weird @salaboy https://github.com/salaboy . I see where you think it was deployed to default given the failure log. But it isn't in default, it is correctly in dapr-system: image.png (view on web) https://github.com/syntasso/kratix-marketplace/assets/1557346/350ee365-388f-449b-a267-68212d82f471

Why would it be trying to connect in default? 🤔

— Reply to this email directly, view it on GitHub https://github.com/syntasso/kratix-marketplace/pull/6#issuecomment-2046910644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCMXWPXN3LTTTGNMQY7X3Y4T2YXAVCNFSM6AAAAABF4EYDPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWHEYTANRUGQ . You are receiving this because you were mentioned.Message ID: @.***>

abangser commented 3 months ago

Hmmmmmmm this is weird now...

So I think we should merge this as I tested with the following steps and it all worked fine:

2024-04-10 14:01  cd kratix
2024-04-10 14:02  g pull
2024-04-10 14:02  make quick-start
2024-04-10 14:12  cd ../kratix-marketplace/dapr
2024-04-10 14:13  g log -1
2024-04-10 14:13  kaf promise.yaml
2024-04-10 14:13  k get ns --context kind-worker -w
2024-04-10 14:14  kx kind-worker
2024-04-10 14:14  k get deploy -n dapr-system -w

The failure happened on an existing k3d cluster but one that had never had dapr installed before, so still a bit odd. But as I said before, I think this is passing our test of running against the documented quick start, and these Promises are inherently meant to be customised for organisations and their environments so if someone runs into something in their environment they are welcome to reach out in the community slack to work through the customisation!