Stack Build Error - Githubissues

svenmueller commented 8 years ago

Hi,

When i use the latest version of https://github.com/metral/corekube/blob/master/corekube-cloudservers.yaml i get an error when creating the stack (rackspace). Any idea?

Resource CREATE failed: resources.kubernetes_minions: Property error: resources[1].properties.networks[0].network: Error validating value '00000000-0000-0000-0000-000000000000': SSL certificate validation has failed: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Thanks, Sven

metral commented 8 years ago

Hello,

I just tried running it and have found no issues.

The command I issued is the same as in the readme: heat stack-create foobar --template-file corekube-cloudservers.yaml -P keyname=<RAX_SSH_KEY>

Are you still running into issues? If so, could you provide some more info/logs of the issue?

svenmueller commented 8 years ago

Hi,

The mentioned issue dissappeared again (looks like Rackspace did some changes/fixes under the hood...). But still i have problems. Now the coreos cluster is not working:

kubernetes-master ~ # fleetctl list-units
Error retrieving list of units from repository: googleapi: Error 503: fleet server unable to communicate with etcd

kubernetes-master ~ # journalctl -u etcd.service
-- Logs begin at Sat 2015-09-19 22:49:30 UTC, end at Sat 2015-09-19 22:54:39 UTC. --
Sep 19 22:49:42 kubernetes-master systemd[1]: Started etcd.
Sep 19 22:49:42 kubernetes-master systemd[1]: Starting etcd...
Sep 19 22:49:42 kubernetes-master etcd[1073]: [etcd] Sep 19 22:49:42.559 INFO      | Discovery via http://10.182.65.214:2379 using prefix discovery/<TOKEN>.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Unit entered failed state.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Failed with result 'exit-code'.
Sep 19 22:49:53 kubernetes-master systemd[1]: etcd.service: Service hold-off time over, scheduling restart.

Does it work for you @metral ?

svenmueller commented 8 years ago

Hi @metral,

Any updates on this? Does etcd work for you (e.g. on the kubernetes master)?

Thx Sven

metral commented 8 years ago

Apologies on my lack of a response. Have you tried starting a clean deployment? Are you trying to fix an existing deployment? Please provide me with more information to help recreate your issue

On Thursday, October 1, 2015, Sven Müller notifications@github.com wrote:

Hi @metral https://github.com/metral,

Any updates on this? Does etcd work for you (e.g. on the kubernetes master)?

Thx Sven

— Reply to this email directly or view it on GitHub https://github.com/metral/corekube/issues/22#issuecomment-144935396.

-Mike Metral

svenmueller commented 8 years ago

Hi @metral,

Yep, i always destroy the old stack and create a new stack using the heat template (repeated it couple of times to see if it is reproducable). After the stack is ready, i'm using ssh to access the kubernetes master node. There i can see that there are issues with etcd.

kubernetes-master ~ # journalctl -u etcd.service
-- Logs begin at Sat 2015-09-19 22:49:30 UTC, end at Sat 2015-09-19 22:54:39 UTC. --
Sep 19 22:49:42 kubernetes-master systemd[1]: Started etcd.
Sep 19 22:49:42 kubernetes-master systemd[1]: Starting etcd...
Sep 19 22:49:42 kubernetes-master etcd[1073]: [etcd] Sep 19 22:49:42.559 INFO      | Discovery via http://10.182.65.214:2379 using prefix discovery/<TOKEN>.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Unit entered failed state.
Sep 19 22:49:42 kubernetes-master systemd[1]: etcd.service: Failed with result 'exit-code'.
Sep 19 22:49:53 kubernetes-master systemd[1]: etcd.service: Service hold-off time over, scheduling restart.

Thx for the support :)

metral commented 8 years ago

This is very odd - I've done 2 clean deployments just now and when you originally opened up the issue but I still am not running into the issues that you're describing. My steps from beginning to end in the ORD region:

heat stack-create corekube --template-file corekube-cloudservers.yaml -P keyname=<SSH_REGION_KEY>
wait a couple of min until stack is created
ssh into overlord to check its progress
- from overlord: docker logs overlord - this can take upto an additional 4-5 min after heat creates the stack
- once this is done and it says its seen & deployed to 4 machines (the default - 1 master + 3 minions) i log into the k8s master
in the k8s master my etcd.service is just fine and from there i can use k8s as expected - here is my etcd.service log:

-- Logs begin at Fri 2015-10-02 17:43:38 UTC, end at Fri 2015-10-02 17:51:10 UTC. --
Oct 02 17:43:49 kubernetes-master systemd[1]: Started etcd.
Oct 02 17:43:49 kubernetes-master systemd[1]: Starting etcd...
Oct 02 17:43:49 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:49.999 INFO      | Discovery via http://10.210.104.64:2379 using prefix discovery/GhOOQ7AxAQr0wgygYd6eHrgkk7pNuQsX.
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.013 INFO      | Discovery found peers [http://10.210.104.74:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.013 INFO      | Discovery fetched back peer list: [10.210.104.74:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.018 INFO      | Send Join Request to http://10.210.104.74:7001/join
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.029 INFO      | kubernetes_master joined the cluster via peer 10.210.104.74:7001
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | etcd server [name kubernetes_master, listen on :4001, advertised url http://10.210.104.78:4001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | peer server [name kubernetes_master, listen on :7001, advertised url http://10.210.104.78:7001]
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | kubernetes_master starting in peer mode
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.032 INFO      | kubernetes_master: state changed from 'initialized' to 'follower'.
Oct 02 17:43:50 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:50.080 INFO      | kubernetes_master: peer added: 'overlord'
Oct 02 17:43:53 kubernetes-master etcd[1052]: [etcd] Oct  2 17:43:53.669 INFO      | kubernetes_master: peer added: 'kubernetes_minion_0'
Oct 02 17:44:03 kubernetes-master etcd[1052]: [etcd] Oct  2 17:44:03.234 INFO      | kubernetes_master: peer added: 'kubernetes_minion_2'
Oct 02 17:44:16 kubernetes-master etcd[1052]: [etcd] Oct  2 17:44:16.633 INFO      | kubernetes_master: peer added: 'kubernetes_minion_1'

can you provide the steps you're taking? from your issues it seems that you're discovery node is not setting up the private etcd server that both the overlord and k8s use/depend on, but I am not sure as to why its having issues.

could you try again deploying from scratch or provide me with more information into your discovery node's log files for the container running it: docker logs discovery

metral commented 8 years ago

Closing due to inactivity. Please reopen if the issues still continue.

metral / corekube

Stack Build Error #22