securesign / secure-sign-operator

Apache License 2.0
3 stars 17 forks source link

If installation is stalled or not successful Create tree will continuously try to run #230

Closed cooktheryan closed 4 months ago

cooktheryan commented 6 months ago

If an install is deleted before completing the create tree / admin server piece runs indefinitely and stops other installations from occuring

I0228 19:31:31.535384       1 admin.go:50] CreateTree...
E0228 19:31:31.535525       1 admin.go:55] Admin server unavailable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup trillian-logserver.default.svc.cluster.local on 172.31.0.10:53: no such host"
I0228 19:31:34.800037       1 admin.go:50] CreateTree...
E0228 19:31:34.800107       1 admin.go:55] Admin server unavailable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup trillian-logserver.default.svc.cluster.local on 172.31.0.10:53: no such host"
2024-02-28T19:31:35Z    DEBUG   controller.tuf  Reconciling TUF {"controller": "tuf", "controllerGroup": "rhtas.redhat.com", "controllerKind": "Tuf", "Tuf": {"name":"securesign-sample","namespace":"securesign"}, "namespace": "securesign", "name": "securesign-sample", "reconcileID": "fbadbf13-97cd-4e34-bcb3-cad65eb070ca", "request": {"name":"securesign-sample","namespace":"securesign"}}
2024-02-28T19:31:40Z    DEBUG   controller.tuf  Reconciling TUF {"controller": "tuf", "controllerGroup": "rhtas.redhat.com", "controllerKind": "Tuf", "Tuf": {"name":"securesign-sample","namespace":"securesign"}, "namespace": "securesign", "name": "securesign-sample", "reconcileID": "ae6e6f62-3114-4ab9-a140-b9ea967ec2a4", "request": {"name":"securesign-sample","namespace":"securesign"}}
2024-02-28T19:31:45Z    DEBUG   controller.tuf  Reconciling TUF {"controller": "tuf", "controllerGroup": "rhtas.redhat.com", "controllerKind": "Tuf", "Tuf": {"name":"securesign-sample","namespace":"securesign"}, "namespace": "securesign", "name": "securesign-sample", "reconcileID": "7dba53fa-f04e-499b-a85d-ad775562032a", "request": {"name":"securesign-sample","namespace":"securesign"}}
I0228 19:31:46.065350       1 admin.go:50] CreateTree...
E0228 19:31:46.065491       1 admin.go:55] Admin server unavailable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup trillian-logserver.default.svc.cluster.local on 172.31.0.10:53: no such host"
2024-02-28T19:31:50Z    DEBUG   controller.tuf  Reconciling TUF {"controller": "tuf", "controllerGroup": "rhtas.redhat.com", "controllerKind": "Tuf", "Tuf": {"name":"securesign-sample","namespace":"securesign"}, "namespace": "securesign", "name": "securesign-sample", "reconcileID": "9f8e3624-1bad-4a81-a37c-093d3549b2f8", "request": {"name":"securesign-sample","namespace":"securesign"}}
I0228 19:31:51.536476       1 admin.go:50] CreateTree...
E0228 19:31:51.536534       1 admin.go:55] Admin server unavailable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup trillian-logserver.default.svc.cluster.local on 172.31.0.10:53: no such host"

How to reproduce 1) deploy a securesign instance in a namespace 2) delete securesign before successfully deployed 3) deploy securesign in correct namespace 4) observe logs

cooktheryan commented 6 months ago

Only one securesign exists but the controller still attempts to resolve the existing

 kubectl get securesign -A
NAMESPACE    NAME                STATUS     REKOR URL   FULCIO URL                                                                                     TUF URL
securesign   securesign-sample   Creating               https://fulcio-server-securesign.apps.1fefad8f9333a02093ff.hypershift.aws-2.ci.openshift.org
cooktheryan commented 6 months ago

workaround is deleting the operator container but not good long term solution