openshift / os

90 stars 107 forks source link

oc cluster up fails #72

Closed cgwalters closed 6 years ago

cgwalters commented 6 years ago

Not entirely sure what's going wrong...looking through the logs quickly I see some etcd cert issues at least.

# rpm-ostree status
State: idle; auto updates disabled
Deployments:
● ostree://rhcos:openshift/3.10/x86_64/os
                   Version: 3.10-7.5.23 (2018-05-29 16:09:02)
                    Commit: d953d2db12ae9f778814d879ef5ad7a7636c5a9de86b0fc8c72ac46f9aa255db
# rpm -q origin
origin-3.10.0-0.alpha.0.1323.836f8e3.x86_64
ashcrow commented 6 years ago

I'll try to take a look at this as well.

ashcrow commented 6 years ago
May 31 21:08:11 rhcos dockerd-current[2204]: time="2018-05-31T21:08:11.470377867Z" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container cd653d1ee58aa41bab78d352d3f
311268a480f5fb8eefb2163f6b3f2576e2b46"                                                                                                                                                        
May 31 21:08:11 rhcos dockerd-current[2204]: Error: required flag(s) "config" not set                                                                                                          May 31 21:08:11 rhcos dockerd-current[2204]: Usage:                                                                                                                                           
May 31 21:08:11 rhcos dockerd-current[2204]:   hypershift experimental openshift-webconsole-operator [flags]        

at first pass this looks more like an issue with oc cluster up. Though I'm not 100% sure yet.

ashcrow commented 6 years ago
ay 31 21:03:14 rhcos dockerd-current[2204]: 2018-05-31 21:03:14.694526 I | etcdserver/api: enabled capabilities for version 3.2
May 31 21:03:14 rhcos dockerd-current[2204]: 2018-05-31 21:03:14.694544 I | etcdserver: published {Name:openshift.local ClientURLs:[https://127.0.0.1:4001]} to cluster dcf5ba954f7ebe11
May 31 21:03:14 rhcos dockerd-current[2204]: I0531 21:03:14.694560       1 run.go:81] Started etcd at 127.0.0.1:4001
May 31 21:03:14 rhcos dockerd-current[2204]: 2018-05-31 21:03:14.694567 I | embed: ready to serve client requests
May 31 21:03:14 rhcos dockerd-current[2204]: INFO: 2018/05/31 21:03:14 dialing to target with scheme: ""
May 31 21:03:14 rhcos dockerd-current[2204]: INFO: 2018/05/31 21:03:14 could not get resolver for scheme: ""
May 31 21:03:14 rhcos dockerd-current[2204]: 2018-05-31 21:03:14.694727 I | embed: serving client requests on [::]:4001
May 31 21:03:15 rhcos dockerd-current[2204]: WARNING: 2018/05/31 21:03:15 Failed to dial 0.0.0.0:4001: connection error: desc = "transport: authentication handshake failed: remote error: tls:

I also see the etcd and tls issue.

ashcrow commented 6 years ago

@mfojtik do you have any pointers to help us debug what we are seeing here?

jlebon commented 6 years ago

Yeah, definitely looks like some flag isn't being passed correctly:

Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]: Error: required flag(s) "config" not set
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]: Usage:
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:   hypershift experimental openshift-webconsole-operator [flags]
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]: Flags:
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:       --config string       Location of the master configuration file to run from.
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:   -h, --help                help for openshift-webconsole-operator
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:       --kubeconfig string   Location of the master configuration file to run from.
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]: Global Flags:
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:       --allow-verification-with-non-compliant-keys   Allow a SignatureVerifier to use keys which are technically non-compliant with RFC6962.
Jun 01 20:26:06 jlebon-tmp dockerd-current[1955]:       --alsologtostderr                              log to standard error as well as files
...

I was able to bring it up if I exclude the webconsole:

# oc cluster up --enable -web-console
...
# oc status
In project My Project (myproject) on server https://127.0.0.1:8443

You have no services, deployment configs, or build configs.
Run 'oc new-app' to create an application.

It looks like it may have been fixed by https://github.com/openshift/origin/pull/19837/files#diff-f1240eecd9d848d1fb023dd8aa555622.

We'll find out in the next iteration.

jlebon commented 6 years ago

This is fixed in 3.10-7.5.24. I also opened https://github.com/projectatomic/atomic-host-tests/issues/407.