sourcegraph / src-cli

Sourcegraph CLI
https://sourcegraph.com
Apache License 2.0
267 stars 57 forks source link

validate subcommand: various changes #1088

Closed craigfurman closed 1 month ago

craigfurman commented 1 month ago

Remove "connections" validation. This checked that various services were TCP-reachable (and resolvable by kube-dns), from various pods. For example, it checked that pgsql was reachable from frontend. This particular validation wouldn't always be useful, e.g. when external postgres clusters were used. It was also failing due to the removal of nc from base images.

This upstream health-checking could be replaced by doing this check in the readiness probe handler. For example: frontend pods could check the availability of a pgsql connection, and fail their readiness probe if that is unavailable. This is a contentious topic however and should be approached with caution: mean-time-to-recovery can actually substantially increase under some configurations due to cascading failing readiness probes. This commit conservatively aims to restore utility of src-validate by removing the always-failing connections check, and there are no immediate plans to add upstream readiness probing.

This commit also changes src-validate to exit with non-zero status when there are no pods and/or services. This white-box check does have some utility, as a sanity check that we've deployed anything at all. The "no PVCs" case is left as a warning, in case the user is using external DBs everywhere, and might legitimately have no PVCs in their namespace.

This closes https://linear.app/sourcegraph/issue/REL-42/src-validate-should-not-exit-with-zero-when-no-pods-are-available-at and https://linear.app/sourcegraph/issue/REL-40/fix-src-validate-now-that-nc-is-not-available, assuming we agree with https://linear.app/sourcegraph/issue/REL-40/fix-src-validate-now-that-nc-is-not-available#comment-7b40beee.

It will be followed-up with an implementation of https://linear.app/sourcegraph/issue/REL-43/src-validate-should-perform-black-box-validations-of-core-flows.


Draft until the general idea is 👍 / 👎 by another member of the release team, before asking for codeowner review.

Test plan

  1. Ran go run ./cmd/src validate kube against an empty k8s namespace. It failed as expected. 2.Ran go run ./cmd/src validate kube against a k8s namespace with sg's helm chart deployed. It succeeded as expected.