kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
15.67k stars 6.36k forks source link

On AWS EC2 setting `access_ip` to the `public_dns_name` breaks etcd certificate generation #3659

Closed kesor closed 5 years ago

kesor commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

fatal: [evg-master-01 -> 34.253.10.164]: FAILED! => {                                                                                                                                                                                                                                                                                                          [14/42587]
    "changed": true,                                                                                                                                                                                                                                                                                                                                                     
    "cmd": [                                                                                                                                                                                                                                                                                                                                                             
        "bash",                                                                                                                                                                                                                                                                                                                                                          
        "-x",
        "/usr/local/bin/etcd-scripts/make-ssl-etcd.sh",
        "-f",
        "/etc/ssl/etcd/openssl.conf",
        "-d",
        "/etc/ssl/etcd/ssl"
    ],
    "delta": "0:00:00.132845",
    "end": "2018-11-06 23:12:49.252829",
    "invocation": {
        "module_args": {
            "_raw_params": "bash -x /usr/local/bin/etcd-scripts/make-ssl-etcd.sh -f /etc/ssl/etcd/openssl.conf -d /etc/ssl/etcd/ssl",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2018-11-06 23:12:49.119984",
    "stderr": "+ set -o errexit\n+ set -o pipefail\n+ (( 4 ))\n+ case \"$1\" in\n+ CONFIG=/etc/ssl/etcd/openssl.conf\n+ shift 2\n+ (( 2 ))\n+ case \"$1\" in\n+ SSLDIR=/etc/ssl/etcd/ssl\n+ shift 2\n+ (( 0 ))\n+ '[' -z /etc/ssl/etcd/openssl.conf ']'\n+ '[' -z /etc/ssl/etcd/ssl ']'\n++ mktemp -d /tmp/etcd_cacert.XXXXXX\n+ tmpdir=/tmp/etcd_cacert.Ya8mzW\n+ trap 'rm -rf \"${tmpdir}\"' EXIT\n+ cd /tmp/etcd_cacert.Ya8mzW\n+ mkdir -p /etc/ssl/etcd/ssl\n+ '[' -e /etc/ssl/etcd/ssl/ca-key.pem ']'\n+ openssl genrsa -out ca-key.pem 2048\n+ openssl req -x509 -new -nodes -key ca-key.pem -days 36500 -out ca.pem -subj /CN=etcd-ca\n+ '[' -n '  evg-master-01    evg-master-02    evg-master-03  ' ']'\n+ for host in '$MASTERS'\n+ cn=evg-master-01\n+ openssl genrsa -out member-evg-master-01-key.pem 2048\n+ openssl req -new -key member-evg-master-01-key.pem -out member-evg-master-01.csr -subj /CN=etcd-member-evg-master-01 -config /etc/ssl/etcd/openssl.conf\n+ rm -rf /tmp/etcd_cacert.Ya8mzW",                                                                                                    
    "stderr_lines": [
        "+ set -o errexit",
        "+ set -o pipefail",
        "+ (( 4 ))",
        "+ case \"$1\" in",
        "+ CONFIG=/etc/ssl/etcd/openssl.conf",
        "+ shift 2",
        "+ (( 2 ))",
        "+ case \"$1\" in",
        "+ SSLDIR=/etc/ssl/etcd/ssl",
        "+ shift 2",
        "+ (( 0 ))",
        "+ '[' -z /etc/ssl/etcd/openssl.conf ']'",
        "+ '[' -z /etc/ssl/etcd/ssl ']'",
        "++ mktemp -d /tmp/etcd_cacert.XXXXXX",
        "+ tmpdir=/tmp/etcd_cacert.Ya8mzW",
        "+ trap 'rm -rf \"${tmpdir}\"' EXIT",
        "+ cd /tmp/etcd_cacert.Ya8mzW",
        "+ mkdir -p /etc/ssl/etcd/ssl",
        "+ '[' -e /etc/ssl/etcd/ssl/ca-key.pem ']'",
        "+ openssl genrsa -out ca-key.pem 2048",
        "+ openssl req -x509 -new -nodes -key ca-key.pem -days 36500 -out ca.pem -subj /CN=etcd-ca",
        "+ '[' -n '  evg-master-01    evg-master-02    evg-master-03  ' ']'",
        "+ for host in '$MASTERS'",
        "+ cn=evg-master-01",
        "+ openssl genrsa -out member-evg-master-01-key.pem 2048",
        "+ openssl req -new -key member-evg-master-01-key.pem -out member-evg-master-01.csr -subj /CN=etcd-member-evg-master-01 -config /etc/ssl/etcd/openssl.conf",
        "+ rm -rf /tmp/etcd_cacert.Ya8mzW"
    ],
    "stdout": "",
    "stdout_lines": []
}

The error is actually caused by this command (the output is not shown in ansible output):

# openssl req -new -key member-evg-master-01-key.pem -out member-evg-master-01.csr -subj /CN=etcd-member-evg-master-01 -config /etc/ssl/etcd/openssl.conf
Error Loading request extension section v3_req
139938585241504:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=ec2-34-253-10-164.eu-west-1.compute.amazonaws.com
139938585241504:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_names

And the problem is kubespray access_ip inventory host variable which was set to the EC2 instance public_dns_name and templated into the /etc/ssl/etcd/openssl.conf file like so:

# cat /etc/ssl/etcd/openssl.conf 
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name

[req_distinguished_name]

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

[ ssl_client ]
extendedKeyUsage = clientAuth, serverAuth
basicConstraints = CA:FALSE
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
subjectAltName = @alt_names

[ v3_ca ]
basicConstraints = CA:TRUE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
authorityKeyIdentifier=keyid:always,issuer

[alt_names]
DNS.1 = localhost
DNS.2 = evg-master-01
DNS.3 = evg-master-02
DNS.4 = evg-master-03
DNS.5 = lb-apiserver.kubernetes.local
DNS.6 = etcd.kube-system.svc.cluster.local
DNS.7 = etcd.kube-system.svc
DNS.8 = etcd.kube-system
DNS.9 = etcd
IP.1 = ec2-34-253-10-164.eu-west-1.compute.amazonaws.com
IP.2 = 172.31.166.146
IP.3 = ec2-18-202-228-140.eu-west-1.compute.amazonaws.com
IP.4 = 172.31.3.173
IP.5 = ec2-34-242-120-199.eu-west-1.compute.amazonaws.com
IP.6 = 172.31.192.164
IP.7 = 127.0.0.1

Using the actual public IP address is not an option either, because it is causing #3658 and a host of other problems later - instances cannot use the public IPs of their peers (or themselves) directly.

Environment:

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"


- **Version of Ansible** (`ansible --version`):

ansible 2.6.5 config file = /etc/ansible/ansible.cfg configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible python version = 3.6.5 (default, Aug 22 2018, 14:20:40) [GCC 6.4.0]



**Kubespray version (commit) (`git rev-parse --short HEAD`):**
`14c2df0`
fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/3659#issuecomment-500378064): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.