kcp-dev / kcp

Kubernetes-like control planes for form-factors and use-cases beyond Kubernetes and container workloads.
https://kcp.io
Apache License 2.0
2.35k stars 381 forks source link

Redirect issues with Workspaces and standalone virtual-workspace server #1806

Closed ncdc closed 4 months ago

ncdc commented 2 years ago

When the partial metadata informer is trying to request /clusters/*/apis/tenancy.kcp.dev/v1beta1/workspaces from the kcp process, this gets redirected to the front proxy, and we see this error

W0819 11:41:35.676289       1 reflector.go:324] k8s.io/client-go@v0.0.0-20220803191238-b6a732dbd013/tools/cache/reflector.go:167: failed to list *unstructured.Unstructured: Get "https://<external url>/services/workspaces/%2A/apis/tenancy.kcp.dev/v1beta1/workspaces?limit=500&resourceVersion=0": x509: certificate is valid for [REDACTED], not apiserver-loopback-client

We need to figure out how to handle situations where the loopback client might get redirected to another URL where it can't validate the certificate.

Originally posted by @ncdc in https://github.com/kcp-dev/kcp/issues/1654#issuecomment-1220579059

ncdc commented 2 years ago

FYI @sttts @p0lyn0mial @stevekuznetsov @csams from our discussion today

p0lyn0mial commented 2 years ago

unfortunately, this issue hits also any controller that needs to deal with an external vw server, like the apibinding_deletion_controller (controllers use c.identityConfig = rest.CopyConfig(c.GenericConfig.LoopbackClientConfig)

I0824 13:15:01.340989   13424 apibinding_deletion_controller.go:328] "patching APIBinding" reconciler="kcp-apibindingdeletion" key="root:e2e-org-hchc2|tenancy.kcp.dev" apibinding.workspace="root:e2e-org-hchc2" apibinding.namespace="" apibinding.name="tenancy.kcp.dev" apibinding.apiVersion="" patch="{\"metadata\":{\"resourceVersion\":\"1573\",\"uid\":\"b6ee73d3-806c-4a65-818c-90db80ba406a\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"APIExportValid\"},{\"lastTransitionTime\":\"2022-08-24T11:15:01Z\",\"message\":\"Get \\\"https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces\\\": x509: certificate is valid for localhost, not apiserver-loopback-client\",\"reason\":\"ResourceDeletionFailed\",\"severity\":\"Error\",\"status\":\"False\",\"type\":\"BindingResourceDeleteSuccess\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"BindingUpToDate\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"InitialBindingCompleted\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"PermissionClaimsApplied\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"PermissionClaimsValid\"}]}}"
E0824 13:15:01.360752   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client
I0824 13:15:01.579106   13424 apibinding_deletion_controller.go:328] "patching APIBinding" reconciler="kcp-apibindingdeletion" key="root:e2e-org-hchc2:e2e-workspace-25cmk|tenancy.kcp.dev" apibinding.workspace="root:e2e-org-hchc2:e2e-workspace-25cmk" apibinding.namespace="" apibinding.name="tenancy.kcp.dev" apibinding.apiVersion="" patch="{\"metadata\":{\"resourceVersion\":\"1588\",\"uid\":\"21aec9fe-5d52-47c9-9e5e-134e37b8f13e\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"APIExportValid\"},{\"lastTransitionTime\":\"2022-08-24T11:15:01Z\",\"message\":\"Get \\\"https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2:e2e-workspace-25cmk/apis/tenancy.kcp.dev/v1beta1/workspaces\\\": x509: certificate is valid for localhost, not apiserver-loopback-client\",\"reason\":\"ResourceDeletionFailed\",\"severity\":\"Error\",\"status\":\"False\",\"type\":\"BindingResourceDeleteSuccess\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"BindingUpToDate\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"InitialBindingCompleted\"},{\"lastTransitionTime\":\"2022-08-24T11:14:55Z\",\"status\":\"True\",\"type\":\"PermissionClaimsApplied\"},{\"lastTransitionTime\":\"2022-08-24T11:14:55Z\",\"status\":\"True\",\"type\":\"PermissionClaimsValid\"}]}}"
E0824 13:15:01.581461   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client
E0824 13:15:01.666039   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2:e2e-workspace-25cmk|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2:e2e-workspace-25cmk/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client

it also affects users accessing workspaces, i.e.

 k get --raw '/clusters/root/apis/tenancy.kcp.dev/v1beta1/workspaces'
Unable to connect to the server: x509: certificate is valid for 192.168.32.104, not 127.0.0.1
p0lyn0mial commented 2 years ago

Maybe we need a proxy rather than a simple redirection? The proxy could verify the vw server cert.

ncdc commented 2 years ago

it also affects users accessing workspaces, i.e.

k get --raw '/clusters/root/apis/tenancy.kcp.dev/v1beta1/workspaces' Unable to connect to the server: x509: certificate is valid for 192.168.32.104, not 127.0.0.1

This is dependent upon the deployment topology. We have a topology where this URL redirects to the front proxy (which then maps it into the virtual workspaces container). In this setup, a client is able to validate the front proxy's certificate correctly and everything works.

sttts commented 2 years ago

This is dependent upon the deployment topology. We have a topology where this URL redirects to the front proxy (which then maps it into the virtual workspaces container)

Do we really do that? That's wrong. We must go directly to the vw address.

sttts commented 2 years ago

When the partial metadata informer is trying to request /clusters/*/apis/tenancy.kcp.dev/v1beta1/workspaces from the kcp process, this gets redirected to the front proxy, and we see this error

One step back: do we actually want that the ddsif lists projections?

p0lyn0mial commented 2 years ago

To sum up a discussion with Stefan in slack. The ddsif and the apibinding_deletion_controller shouldn't use projection resources. Instead of workspaces they should use ClusterWorkspaces.

For local development, we should use some domain name so that we can validate the server. It should be possible since we have self-signed certs and CAs.

ncdc commented 2 years ago

https://github.com/kcp-dev/kcp/pull/1805 stops the ddsif from using v1beta1 Workspaces. But this is hard-coded for the time being.

sttts commented 2 years ago

But this is hard-coded for the time being.

This is fine for now until we come up with a generic projection concept.

sttts commented 2 years ago

What we could do now is to introduce a pkg/projection package and have a list there with projected GRs, and a map where to map to.

ncdc commented 2 years ago

On it (need it to fix another issue)

ncdc commented 2 years ago

1860

ncdc commented 2 years ago

Cleared milestone and put in backlog

kcp-ci-bot commented 6 months ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kcp-ci-bot commented 5 months ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kcp-ci-bot commented 4 months ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kcp-ci-bot commented 4 months ago

@kcp-ci-bot: Closing this issue.

In response to [this](https://github.com/kcp-dev/kcp/issues/1806#issuecomment-2161559776): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.