sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
71 stars 64 forks source link

Capture and announce error when failing to build ingress rules #114

Closed rocoll closed 10 months ago

rocoll commented 3 years ago

I screwed up with naming an env var and inadvertently directed viya4-deployment to build Ingress rules referring to a malformed DNS name.

Error from server (Invalid): error when creating "site.yaml": Ingress.extensions "sas-theme-designer-app" is invalid: 
[spec.rules[0].host: Invalid value: ".gelsandbox.aws.unx.sas.com": a DNS-1123 subdomain must consist of lower case 
alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used 
for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.tls[0].hosts[0]: Invalid value: 
".gelsandbox.aws.unx.sas.com": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and 
must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-
z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]

If I run kubectl apply directly, I can watch it generate 100+ error messages about this problem. However, Ansible (as directed by viya4-deployment) gave no hint at all about the problem. Its play recap mentions zero errors. From past experience, I understand that a zero error Ansible run doesn't equal a perfectly sound deployment. But scores of error messages about a critical infrastructure component should probably be captured and announced.

It took a while to run it down. After completing the failed deployment, the key symptom is that none of the Viya pods start (stuck initializing, crash loop backoff). Digging into them, certframe complains it cannot find "sas-ingress-certificate" (HTTP 404 error). And sure enough, that secret doesn't exist. But the real clue was noticing that none of the ingress rules exist.

thpang commented 3 years ago

Looking to see if we can put a simple check on the format when consuming the FQDN value.

thpang commented 2 years ago

@rocoll is this error being generated from kustomize? Also where are you setting .gelsandbox.aws.unx.sas.com. If it's an ansible variable we could have a check associated. Looking at assigning or closing these older issues out. Thanks again.

sayeun commented 10 months ago

Marking as stale/inactive. If there are further questions please open a new GitHub issue.