paz-sh / paz

An open-source, in-house service platform with a PaaS-like workflow, built on Docker, CoreOS, Etcd and Fleet. This repository houses the documentation and installation scripts.
http://paz.sh
Other
1.08k stars 56 forks source link

":( something went wrong" #37

Closed thecatwasnot closed 9 years ago

thecatwasnot commented 9 years ago

Hi guys, I attempted to bring up a vagrant cluster last night and the paz-web.paz dashboard failed with the above message. I know very little about javascript/ember so I'm not sure how much more debugging I can do, I'm not even sure where to start at this point.

fleetctl --version
fleetctl version 0.9.1
etcdctl --version
etcdctl version 2.0.4
vagrant --version
Vagrant 1.7.2

I started it up with:

./scripts/install-vagrant.sh 
Installing Paz on Vagrant

Checking for existing Vagrant cluster

Creating a new Vagrant cluster
Cloning into 'coreos-vagrant'...
remote: Counting objects: 351, done.
remote: Total 351 (delta 0), reused 0 (delta 0), pack-reused 351
Receiving objects: 100% (351/351), 79.37 KiB | 0 bytes/s, done.
Resolving deltas: 100% (152/152), done.
Checking connectivity... done.
==> core-01: Box 'coreos-beta' not installed, can't check for updates.
==> core-02: Box 'coreos-beta' not installed, can't check for updates.
==> core-03: Box 'coreos-beta' not installed, can't check for updates.
Bringing machine 'core-01' up with 'virtualbox' provider...
Bringing machine 'core-02' up with 'virtualbox' provider...
Bringing machine 'core-03' up with 'virtualbox' provider...
==> core-01: Box 'coreos-beta' could not be found. Attempting to find and install...
    core-01: Box Provider: virtualbox
    core-01: Box Version: >= 308.0.1
==> core-01: Loading metadata for box 'http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json'
    core-01: URL: http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json
==> core-01: Adding box 'coreos-beta' (v607.0.0) for provider: virtualbox
    core-01: Downloading: http://beta.release.core-os.net/amd64-usr/607.0.0/coreos_production_vagrant.box
    core-01: Calculating and comparing box checksum...
==> core-01: Successfully added box 'coreos-beta' (v607.0.0) for 'virtualbox'!
==> core-01: Importing base box 'coreos-beta'...
==> core-01: Matching MAC address for NAT networking...
==> core-01: Checking if box 'coreos-beta' is up to date...
==> core-01: Setting the name of the VM: coreos-vagrant_core-01_1425861758586_22045
==> core-01: Clearing any previously set network interfaces...
==> core-01: Preparing network interfaces based on configuration...
    core-01: Adapter 1: nat
    core-01: Adapter 2: hostonly
==> core-01: Forwarding ports...
    core-01: 22 => 2222 (adapter 1)
==> core-01: Running 'pre-boot' VM customizations...
==> core-01: Booting VM...
==> core-01: Waiting for machine to boot. This may take a few minutes...
    core-01: SSH address: 127.0.0.1:2222
    core-01: SSH username: core
    core-01: SSH auth method: private key
    core-01: Warning: Connection timeout. Retrying...
==> core-01: Machine booted and ready!
==> core-01: Setting hostname...
==> core-01: Configuring and enabling network interfaces...
==> core-01: Running provisioner: file...
==> core-01: Running provisioner: shell...
    core-01: Running: inline script
==> core-02: Box 'coreos-beta' could not be found. Attempting to find and install...
    core-02: Box Provider: virtualbox
    core-02: Box Version: >= 308.0.1
==> core-02: Loading metadata for box 'http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json'
    core-02: URL: http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json
==> core-02: Adding box 'coreos-beta' (v607.0.0) for provider: virtualbox
==> core-02: Importing base box 'coreos-beta'...
==> core-02: Matching MAC address for NAT networking...
==> core-02: Checking if box 'coreos-beta' is up to date...
==> core-02: Setting the name of the VM: coreos-vagrant_core-02_1425861790309_80904
==> core-02: Fixed port collision for 22 => 2222. Now on port 2200.
==> core-02: Clearing any previously set network interfaces...
==> core-02: Preparing network interfaces based on configuration...
    core-02: Adapter 1: nat
    core-02: Adapter 2: hostonly
==> core-02: Forwarding ports...
    core-02: 22 => 2200 (adapter 1)
==> core-02: Running 'pre-boot' VM customizations...
==> core-02: Booting VM...
==> core-02: Waiting for machine to boot. This may take a few minutes...
    core-02: SSH address: 127.0.0.1:2200
    core-02: SSH username: core
    core-02: SSH auth method: private key
    core-02: Warning: Connection timeout. Retrying...
==> core-02: Machine booted and ready!
==> core-02: Setting hostname...
==> core-02: Configuring and enabling network interfaces...
==> core-02: Running provisioner: file...
==> core-02: Running provisioner: shell...
    core-02: Running: inline script
==> core-03: Box 'coreos-beta' could not be found. Attempting to find and install...
    core-03: Box Provider: virtualbox
    core-03: Box Version: >= 308.0.1
==> core-03: Loading metadata for box 'http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json'
    core-03: URL: http://beta.release.core-os.net/amd64-usr/current/coreos_production_vagrant.json
==> core-03: Adding box 'coreos-beta' (v607.0.0) for provider: virtualbox
==> core-03: Importing base box 'coreos-beta'...
==> core-03: Matching MAC address for NAT networking...
==> core-03: Checking if box 'coreos-beta' is up to date...
==> core-03: Setting the name of the VM: coreos-vagrant_core-03_1425861823848_20722
==> core-03: Fixed port collision for 22 => 2222. Now on port 2201.
==> core-03: Clearing any previously set network interfaces...
==> core-03: Preparing network interfaces based on configuration...
    core-03: Adapter 1: nat
    core-03: Adapter 2: hostonly
==> core-03: Forwarding ports...
    core-03: 22 => 2201 (adapter 1)
==> core-03: Running 'pre-boot' VM customizations...
==> core-03: Booting VM...
==> core-03: Waiting for machine to boot. This may take a few minutes...
    core-03: SSH address: 127.0.0.1:2201
    core-03: SSH username: core
    core-03: SSH auth method: private key
    core-03: Warning: Connection timeout. Retrying...
==> core-03: Machine booted and ready!
==> core-03: Setting hostname...
==> core-03: Configuring and enabling network interfaces...
==> core-03: Running provisioner: file...
==> core-03: Running provisioner: shell...
    core-03: Running: inline script
Waiting for Vagrant cluster to be ready...
CoreOS Vagrant cluster is up

Configuring SSH
Identity added: /home/thecatwasnot/.vagrant.d/insecure_private_key (/home/thecatwasnot/.vagrant.d/insecure_private_key)

Starting paz runlevel 1 units
Unit paz-scheduler.service launched on 7641f8b0.../172.17.8.101
Unit paz-orchestrator.service launched on 53f5997f.../172.17.8.102
Unit paz-service-directory-announce.service launched on b9bc6257.../172.17.8.103
Unit paz-service-directory.service launched on b9bc6257.../172.17.8.103
Unit paz-scheduler-announce.service launched on 7641f8b0.../172.17.8.101
Unit paz-orchestrator-announce.service launched on 53f5997f.../172.17.8.102
Successfully started all runlevel 1 paz units on the cluster with Fleet
Waiting for runlevel 1 services to be activated...
Activating: 0 | Active: 6 | Failed: 0.  
All runlevel 1 units successfully activated!

Waiting for orchestrator, scheduler and service directory to be announced

Starting paz runlevel 2 units
Unit paz-web.service launched on 53f5997f.../172.17.8.102
Unit paz-web-announce.service launched on 53f5997f.../172.17.8.102
Successfully started all runlevel 2 paz units on the cluster with Fleet
Waiting for runlevel 2 services to be activated...
Activating: 0 | Active: 8 | Failed: 0...
All runlevel 2 units successfully activated!

You will need to add the following entries to your /etc/hosts:
172.17.8.101 paz-web.paz
172.17.8.101 paz-scheduler.paz
172.17.8.101 paz-orchestrator.paz
172.17.8.101 paz-orchestrator-socket.paz

Paz installation successful

I did edit /etc/hosts fleet reports everything OK:

vagrant ssh core-01
CoreOS beta (607.0.0)
Update Strategy: No Reboots
core@core-01 ~ $ fleetctl list-units
UNIT                    MACHINE             ACTIVE  SUB
paz-orchestrator-announce.service   53f5997f.../172.17.8.102    active  running
paz-orchestrator.service        53f5997f.../172.17.8.102    active  running
paz-scheduler-announce.service      7641f8b0.../172.17.8.101    active  running
paz-scheduler.service           7641f8b0.../172.17.8.101    active  running
paz-service-directory-announce.service  b9bc6257.../172.17.8.103    active  running
paz-service-directory.service       b9bc6257.../172.17.8.103    active  running
paz-web-announce.service        53f5997f.../172.17.8.102    active  running
paz-web.service             53f5997f.../172.17.8.102    active  running

This morning I tried running the integration test:

./integration.sh 
Starting Paz integration test script
./integration.sh: line 18: checkRequiredEnvVars: command not found

Checking for existing Vagrant cluster

Creating a new Vagrant cluster
Cloning into 'coreos-vagrant'...
remote: Counting objects: 351, done.
remote: Total 351 (delta 0), reused 0 (delta 0), pack-reused 351
Receiving objects: 100% (351/351), 79.37 KiB | 0 bytes/s, done.
Resolving deltas: 100% (152/152), done.
Checking connectivity... done.
==> core-01: Checking for updates to 'coreos-beta'
    core-01: Latest installed version: 607.0.0
    core-01: Version constraints: >= 308.0.1
    core-01: Provider: virtualbox
==> core-01: Box 'coreos-beta' (v607.0.0) is running the latest version.
==> core-02: Checking for updates to 'coreos-beta'
    core-02: Latest installed version: 607.0.0
    core-02: Version constraints: >= 308.0.1
    core-02: Provider: virtualbox
==> core-02: Box 'coreos-beta' (v607.0.0) is running the latest version.
==> core-03: Checking for updates to 'coreos-beta'
    core-03: Latest installed version: 607.0.0
    core-03: Version constraints: >= 308.0.1
    core-03: Provider: virtualbox
==> core-03: Box 'coreos-beta' (v607.0.0) is running the latest version.
Bringing machine 'core-01' up with 'virtualbox' provider...
Bringing machine 'core-02' up with 'virtualbox' provider...
Bringing machine 'core-03' up with 'virtualbox' provider...
==> core-01: Importing base box 'coreos-beta'...
==> core-01: Matching MAC address for NAT networking...
==> core-01: Checking if box 'coreos-beta' is up to date...
==> core-01: Setting the name of the VM: coreos-vagrant_core-01_1425905935661_73514
==> core-01: Clearing any previously set network interfaces...
==> core-01: Preparing network interfaces based on configuration...
    core-01: Adapter 1: nat
    core-01: Adapter 2: hostonly
==> core-01: Forwarding ports...
    core-01: 22 => 2222 (adapter 1)
==> core-01: Running 'pre-boot' VM customizations...
==> core-01: Booting VM...
==> core-01: Waiting for machine to boot. This may take a few minutes...
    core-01: SSH address: 127.0.0.1:2222
    core-01: SSH username: core
    core-01: SSH auth method: private key
    core-01: Warning: Connection timeout. Retrying...
==> core-01: Machine booted and ready!
==> core-01: Setting hostname...
==> core-01: Configuring and enabling network interfaces...
==> core-01: Running provisioner: file...
==> core-01: Running provisioner: shell...
    core-01: Running: inline script
==> core-02: Importing base box 'coreos-beta'...
==> core-02: Matching MAC address for NAT networking...
==> core-02: Checking if box 'coreos-beta' is up to date...
==> core-02: Setting the name of the VM: coreos-vagrant_core-02_1425905966683_94058
==> core-02: Fixed port collision for 22 => 2222. Now on port 2200.
==> core-02: Clearing any previously set network interfaces...
==> core-02: Preparing network interfaces based on configuration...
    core-02: Adapter 1: nat
    core-02: Adapter 2: hostonly
==> core-02: Forwarding ports...
    core-02: 22 => 2200 (adapter 1)
==> core-02: Running 'pre-boot' VM customizations...
==> core-02: Booting VM...
==> core-02: Waiting for machine to boot. This may take a few minutes...
    core-02: SSH address: 127.0.0.1:2200
    core-02: SSH username: core
    core-02: SSH auth method: private key
    core-02: Warning: Connection timeout. Retrying...
==> core-02: Machine booted and ready!
==> core-02: Setting hostname...
==> core-02: Configuring and enabling network interfaces...
==> core-02: Running provisioner: file...
==> core-02: Running provisioner: shell...
    core-02: Running: inline script
==> core-03: Importing base box 'coreos-beta'...
==> core-03: Matching MAC address for NAT networking...
==> core-03: Checking if box 'coreos-beta' is up to date...
==> core-03: Setting the name of the VM: coreos-vagrant_core-03_1425905998600_89301
==> core-03: Fixed port collision for 22 => 2222. Now on port 2201.
==> core-03: Clearing any previously set network interfaces...
==> core-03: Preparing network interfaces based on configuration...
    core-03: Adapter 1: nat
    core-03: Adapter 2: hostonly
==> core-03: Forwarding ports...
    core-03: 22 => 2201 (adapter 1)
==> core-03: Running 'pre-boot' VM customizations...
==> core-03: Booting VM...
==> core-03: Waiting for machine to boot. This may take a few minutes...
    core-03: SSH address: 127.0.0.1:2201
    core-03: SSH username: core
    core-03: SSH auth method: private key
    core-03: Warning: Connection timeout. Retrying...
==> core-03: Machine booted and ready!
==> core-03: Setting hostname...
==> core-03: Configuring and enabling network interfaces...
==> core-03: Running provisioner: file...
==> core-03: Running provisioner: shell...
    core-03: Running: inline script
Waiting for Vagrant cluster to be ready...
CoreOS Vagrant cluster is up

Configuring SSH
Identity added: /home/thecatwasnot/.vagrant.d/insecure_private_key (/home/thecatwasnot/.vagrant.d/insecure_private_key)

Starting paz runlevel 1 units
Unit paz-scheduler.service launched on 14dbc022.../172.17.8.101
Unit paz-scheduler-announce.service launched on 14dbc022.../172.17.8.101
Unit paz-orchestrator.service launched on 4f6c57a6.../172.17.8.103
Unit paz-orchestrator-announce.service launched on 4f6c57a6.../172.17.8.103
Unit paz-service-directory.service launched on 2c75bccd.../172.17.8.102
Unit paz-service-directory-announce.service launched on 2c75bccd.../172.17.8.102
Successfully started all runlevel 1 paz units on the cluster with Fleet
Waiting for runlevel 1 services to be activated...
Activating: 0 | Active: 6 | Failed: 0.. 
All runlevel 1 units successfully activated!

Waiting for orchestrator, scheduler and service directory to be announced

Starting paz runlevel 2 units
Unit paz-web.service launched
Unit paz-web-announce.service launched on 14dbc022.../172.17.8.101
Successfully started all runlevel 2 paz units on the cluster with Fleet
Waiting for runlevel 2 services to be activated...
Activating: 1 | Active: 8 | Failed: 0...
All runlevel 2 units successfully activated!

You will need to add the following entries to your /etc/hosts:
172.17.8.101 paz-web.paz
172.17.8.101 paz-scheduler.paz
172.17.8.101 paz-orchestrator.paz
172.17.8.101 paz-orchestrator-socket.paz

Adding service to directory
{"doc":{"name":"demo-api","description":"Very simple HTTP Hello World server","dockerRepository":"lukebond/demo-api","config":{"publicFacing":false,"numInstances":3,"ports":[],"env":{}}}}
Deploying new service with the /hooks/deploy endpoint
{"statusCode":200}
Waiting for service to announce itself

Which hung for hours (was still waiting when I returned 8 hours later) I've now also tried changing my version of etcdctl to match the one on coreos and no joy.

jemgold commented 9 years ago

I've been having some problems getting paz-web running locally recently - I think it might be linked. Will look into it and get back to you - thanks.

thecatwasnot commented 9 years ago

Thanks.

lukebond commented 9 years ago

Hey @thecatwasnot, thanks so much for such a detailed issue posting, it's really really helpful.

I concur with @jongold that the "something went wrong :(" is most likely an instance of paz-sh/paz-web#3 (fix incoming). But your logs helpfully highlight a few other issues:

./integration.sh Starting Paz integration test script ./integration.sh: line 18: checkRequiredEnvVars: command not found

^ my bad during 2cdec8fa6c4015834e5eecdf4c10e5f986e3a939. Created #38.

Regarding waiting a long time for the service to announce itself, that also is an issue. Do you know if it actually worked but the script failed to detect that it had, or if the service never came up, and if so why?

thecatwasnot commented 9 years ago

It appears demo-api is not up:

core@core-01 ~ $ fleetctl list-units
UNIT                    MACHINE             ACTIVE      SUB
demo-api-1-1.service            84797605.../172.17.8.102    inactive    dead
demo-api-1-2.service            3897eec3.../172.17.8.103    inactive    dead
demo-api-1-3.service            457ac316.../172.17.8.101    inactive    dead
demo-api-announce-1-1.service       84797605.../172.17.8.102    inactive    dead
demo-api-announce-1-2.service       457ac316.../172.17.8.101    inactive    dead
demo-api-announce-1-3.service       3897eec3.../172.17.8.103    inactive    dead
paz-orchestrator-announce.service   3897eec3.../172.17.8.103    active      running
paz-orchestrator.service        3897eec3.../172.17.8.103    active      running
paz-scheduler-announce.service      457ac316.../172.17.8.101    active      running
paz-scheduler.service           457ac316.../172.17.8.101    active      running
paz-service-directory-announce.service  84797605.../172.17.8.102    active      running
paz-service-directory.service       84797605.../172.17.8.102    active      running
paz-web-announce.service        3897eec3.../172.17.8.103    active      running
paz-web.service             3897eec3.../172.17.8.103    active      running
core@core-01 ~ $ fleetctl status demo-api-1-2.service
● demo-api-1-2.service - Very simple HTTP Hello World server (2)
   Loaded: error (Reason: Invalid argument)
   Active: inactive (dead)

Mar 11 01:48:01 core-01 systemd[1]: [/run/fleet/units/demo-api-1-2.service:8] Executable path is not absolute, ignoring: "/usr/bin/docker run --name=demo-api-1-2 -P lukebond/demo-api"
Mar 11 01:48:01 core-01 systemd[1]: demo-api-1-2.service has no ExecStart= setting, which is only allowed for RemainAfterExit=yes services. Refusing.
Mar 11 01:48:02 core-01 systemd[1]: [/run/fleet/units/demo-api-1-2.service:8] Executable path is not absolute, ignoring: "/usr/bin/docker run --name=demo-api-1-2 -P lukebond/demo-api"
Mar 11 01:48:02 core-01 systemd[1]: demo-api-1-2.service has no ExecStart= setting, which is only allowed for RemainAfterExit=yes services. Refusing.
Mar 11 01:48:03 core-01 systemd[1]: [/run/fleet/units/demo-api-1-2.service:8] Executable path is not absolute, ignoring: "/usr/bin/docker run --name=demo-api-1-2 -P lukebond/demo-api"
Mar 11 01:48:03 core-01 systemd[1]: demo-api-1-2.service has no ExecStart= setting, which is only allowed for RemainAfterExit=yes services. Refusing.
Mar 11 01:48:04 core-01 systemd[1]: [/run/fleet/units/demo-api-1-2.service:8] Executable path is not absolute, ignoring: "/usr/bin/docker run --name=demo-api-1-2 -P lukebond/demo-api"
Mar 11 01:48:04 core-01 systemd[1]: demo-api-1-2.service has no ExecStart= setting, which is only allowed for RemainAfterExit=yes services. Refusing.
Mar 11 01:48:04 core-01 systemd[1]: [/run/fleet/units/demo-api-1-2.service:8] Executable path is not absolute, ignoring: "/usr/bin/docker run --name=demo-api-1-2 -P lukebond/demo-api"
Mar 11 01:48:04 core-01 systemd[1]: demo-api-1-2.service has no ExecStart= setting, which is only allowed for RemainAfterExit=yes services. Refusing.
core@core-01 ~ $ fleetctl status demo-api-announce-1-3.service
● demo-api-announce-1-3.service - Very simple HTTP Hello World server announce (3)
   Loaded: error (Reason: Invalid argument)
   Active: inactive (dead)

Mar 11 01:51:06 core-01 systemd[1]: [/run/fleet/units/demo-api-announce-1-3.service:7] Executable path is not absolute, ignoring: "/bin/sh -c \"until docker inspect -f '{{range $p, $conf := .NetworkSettings.Ports}} {{$p}} -> {{(index $conf 0).HostPort}} {{end}}' demo-api-1-3 >/dev/null 2>&1
Mar 11 01:51:06 core-01 systemd[1]: demo-api-announce-1-3.service lacks both ExecStart= and ExecStop= setting. Refusing.
Mar 11 01:51:07 core-01 systemd[1]: [/run/fleet/units/demo-api-announce-1-3.service:7] Executable path is not absolute, ignoring: "/bin/sh -c \"until docker inspect -f '{{range $p, $conf := .NetworkSettings.Ports}} {{$p}} -> {{(index $conf 0).HostPort}} {{end}}' demo-api-1-3 >/dev/null 2>&1
Mar 11 01:51:07 core-01 systemd[1]: demo-api-announce-1-3.service lacks both ExecStart= and ExecStop= setting. Refusing.
Mar 11 01:51:08 core-01 systemd[1]: [/run/fleet/units/demo-api-announce-1-3.service:7] Executable path is not absolute, ignoring: "/bin/sh -c \"until docker inspect -f '{{range $p, $conf := .NetworkSettings.Ports}} {{$p}} -> {{(index $conf 0).HostPort}} {{end}}' demo-api-1-3 >/dev/null 2>&1
Mar 11 01:51:08 core-01 systemd[1]: demo-api-announce-1-3.service lacks both ExecStart= and ExecStop= setting. Refusing.
Mar 11 01:51:09 core-01 systemd[1]: [/run/fleet/units/demo-api-announce-1-3.service:7] Executable path is not absolute, ignoring: "/bin/sh -c \"until docker inspect -f '{{range $p, $conf := .NetworkSettings.Ports}} {{$p}} -> {{(index $conf 0).HostPort}} {{end}}' demo-api-1-3 >/dev/null 2>&1
Mar 11 01:51:09 core-01 systemd[1]: demo-api-announce-1-3.service lacks both ExecStart= and ExecStop= setting. Refusing.
Mar 11 01:51:10 core-01 systemd[1]: [/run/fleet/units/demo-api-announce-1-3.service:7] Executable path is not absolute, ignoring: "/bin/sh -c \"until docker inspect -f '{{range $p, $conf := .NetworkSettings.Ports}} {{$p}} -> {{(index $conf 0).HostPort}} {{end}}' demo-api-1-3 >/dev/null 2>&1
Mar 11 01:51:10 core-01 systemd[1]: demo-api-announce-1-3.service lacks both ExecStart= and ExecStop= setting. Refusing.
lukebond commented 9 years ago

@thecatwasnot could you please post the output of fleetctl cat demo-api-1-1.service?

thecatwasnot commented 9 years ago
fleetctl cat demo-api-1-1.service
[Unit]
Requires=docker.service
After=docker.service
Description=Very simple HTTP Hello World server (1)

[Service]
ExecStartPre=/bin/bash -c "docker pull lukebond/demo-api && docker inspect demo-api-1-1 >/dev/null 2>&1 && docker rm -f demo-api-1-1 || true"
ExecStart="/usr/bin/docker run --name=demo-api-1-1 -P lukebond/demo-api"
ExecStop=/usr/bin/docker stop -t 3 demo-api-1-1
TimeoutStartSec=30m

[X-Fleet]
X-Conflicts=demo-api-1*.service
lukebond commented 9 years ago

Thanks agagin @thecatwasnot. Created scheduler issue paz-sh/paz-scheduler#3

thecatwasnot commented 9 years ago

Awesome, will close this then.