typesafehub / conductr-cli

CLI for Lightbend ConductR
Other
16 stars 21 forks source link

Wait for bundle scale: do not immediately exit when encountering error #493

Closed fsat closed 7 years ago

fsat commented 7 years ago

Instead ignore the error for the first ten seconds to allow the bundle to start and attempt to rectify its error.

fsat commented 7 years ago

Marked as wip - manual test pending.

fsat commented 7 years ago

Manual test is completed successfully.

The wait for bundle scale behaviour now ignores the bundle error for the first 10 seconds.

Setup

Create the following test bundle.

192-168-1-5:test-failing-bundle felixsatyaputra$ find test-failing-bundle -type f
test-failing-bundle/bundle.conf
test-failing-bundle/one/start-one

192-168-1-5:test-failing-bundle felixsatyaputra$ cat test-failing-bundle/bundle.conf
version                  = "1"
name                     = "failing-bundle"
system                   = "failing-bundle"
systemVersion            = "0.1.0"
compatibilityVersion     = "1"
nrOfCpus                 = 0.1
memory                   = 8000000
diskSpace                = 10000000
roles                    = ["test"]
components               = {
  "one" = {
    description      = "A script that echos 5 times, and exits with code 1"
    file-system-type = "universal"
    start-command    = ["one/start-one"]
    endpoints        = {
    }
  }
}

192-168-1-5:test-failing-bundle felixsatyaputra$ cat test-failing-bundle/one/start-one
#!/usr/bin/env bash

echo "Sleep - 1"
sleep 1
echo "Bailing out"
exit 1

Package the bundle using shazar

192-168-1-5:test-failing-bundle felixsatyaputra$ sh test.sh
+ shazar test-failing-bundle
./test-failing-bundle-13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip

Start the sandbox

+ sandbox run 2.1.0-alpha.1
|------------------------------------------------|
| Stopping ConductR                              |
|------------------------------------------------|
ConductR core pid 22855 stopped
ConductR agent pid 22956 stopped
ConductR has been successfully stopped
|------------------------------------------------|
| Starting ConductR                              |
|------------------------------------------------|
Extracting ConductR core to /Users/felixsatyaputra/.conductr/images/core
Extracting ConductR agent to /Users/felixsatyaputra/.conductr/images/agent
Starting ConductR core instance on 192.168.10.1..
Waiting for ConductR to start..
Starting ConductR agent instance on 192.168.10.1..
|------------------------------------------------|
| OCI-in-Docker support unavailable.             |
|------------------------------------------------|
|------------------------------------------------|
| To provide support ensure Docker is running    |
| and restart the sandbox                        |
|------------------------------------------------|
|------------------------------------------------|
| Starting logging feature based on eslite       |
|------------------------------------------------|
Deploying bundle eslite..
Retrieving bundle..
Loading bundle from cache typesafe/bundle/eslite
Bintray credentials loaded from /Users/felixsatyaputra/.lightbend/commercial.credentials
Retrieving from cache /Users/felixsatyaputra/.conductr/cache/bundle/eslite-2.1.0-57e432d0c647be2bbc83fa8e59ee469bb59d1f72df31f3d82cab0ad396130fe7.zip
Loading bundle to ConductR..
[##################################################] 100%
Bundle 57e432d0c647be2bbc83fa8e59ee469b is installed
Bundle loaded.
Bundle run request sent.
Bundle 57e432d0c647be2bbc83fa8e59ee469b waiting to reach expected scale 1
Bundle 57e432d0c647be2bbc83fa8e59ee469b has scale 0, expected 1...
Bundle 57e432d0c647be2bbc83fa8e59ee469b expected scale 1 is met
|------------------------------------------------|
| Summary                                        |
|------------------------------------------------|
|- - - - - - - - - - - - - - - - - - - - - - - - |
| ConductR                                       |
|- - - - - - - - - - - - - - - - - - - - - - - - |
ConductR has been started:
  core instance on 192.168.10.1
  agent instance on 192.168.10.1
ConductR service locator has been started on:
  192.168.10.1:9008
|- - - - - - - - - - - - - - - - - - - - - - - - |
| Proxy                                          |
|- - - - - - - - - - - - - - - - - - - - - - - - |
HAProxy has not been started
To enable proxying ensure Docker is running and restart the sandbox
|- - - - - - - - - - - - - - - - - - - - - - - - |
| Bundles                                        |
|- - - - - - - - - - - - - - - - - - - - - - - - |
Check latest bundle status with:
  conduct info
Current bundle status:
Licensed To: cc64df31-ec6b-4e08-bb6b-3216721a56b@lightbend
Max ConductR agents: 10
ConductR Version(s): 0.1.0, 2.1.*
Grants: akka-sbr, cinnamon, conductr

ID       NAME      TAG  #REP  #STR  #RUN  ROLES
57e432d  eslite  2.1.0     1     0     1  elasticsearch

Load and run the test bundle - this will eventually fail.

+ conduct load ./test-failing-bundle-13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip
Retrieving bundle..
Loading bundle to ConductR..
[##################################################] 100%
Bundle 13eac5ec7acae4691104dfec8847a202 is installed
Bundle loaded.
Start bundle with:        conduct run 13eac5e
Unload bundle with:       conduct unload 13eac5e
Print ConductR info with: conduct info
Print bundle info with:   conduct info 13eac5e

+ conduct run fai
Bundle run request sent.
Bundle 13eac5ec7acae4691104dfec8847a202 waiting to reach expected scale 1
Bundle 13eac5ec7acae4691104dfec8847a202 has scale 0, expected 1...................
Error: Failure to scale bundle 13eac5ec7acae4691104dfec8847a202

Check latest bundle events with:
  conduct events 13eac5ec7acae4691104dfec8847a202
Current bundle events:
TIME                          EVENT                                         DESC
Wed 2017-06-14T14:20:13+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:13+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:14+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:14+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:15+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:15+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:16+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:16+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:16+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:16+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1

Check latest bundle logs with:
  conduct logs 13eac5ec7acae4691104dfec8847a202
Current bundle logs:
TIME                          HOST                     LOG
Wed 2017-06-14T14:20:13+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:13+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:14+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:15+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:16+1000  192-168-1-5.tpgi.com.au  Sleep - 1

Error: Bundle 13eac5ec7acae4691104dfec8847a202 has error

Inspect the latest bundle events and logs using:
  conduct events 13eac5ec7acae4691104dfec8847a202
  conduct logs 13eac5ec7acae4691104dfec8847a202

The bundle has error (i.e. Error attribute equals to Yes)

192-168-1-5:test-failing-bundle felixsatyaputra$ conduct info fa
BUNDLE ATTRIBUTES
-----------------
Bundle Id              ! 13eac5e
Bundle Name            failing-bundle
Compatibility Version  1
System                 failing-bundle
System Version         0.1.0
Tags
Nr of CPUs             0.1
Memory                 8000000
Disk Space             10000000
Roles                  test
Bundle Digest          13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a
Error                  Yes

BUNDLE SCALE
------------
Nr of Reschedules  13
Scale              1

BUNDLE INSTALLATIONS
--------------------
Host    192.168.10.1
Bundle  /Users/felixsatyaputra/.conductr/images/tmp/conductr/192.168.10.1/bundles/13eac5ec7acae4691104dfec8847a202930929e30f6c54646bfd8f54085b5a9a.zip

Test Result

Rerunning the same bundle now waits for 10 seconds before checking the error state.

192-168-1-5:test-failing-bundle felixsatyaputra$ conduct run fa
Bundle run request sent.
Bundle 13eac5ec7acae4691104dfec8847a202 waiting to reach expected scale 1
Bundle 13eac5ec7acae4691104dfec8847a202 has scale 0, expected 1....................
Error: Failure to scale bundle 13eac5ec7acae4691104dfec8847a202

Check latest bundle events with:
  conduct events 13eac5ec7acae4691104dfec8847a202
Current bundle events:
TIME                          EVENT                                         DESC
Wed 2017-06-14T14:20:40+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:40+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:41+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:41+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:41+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:42+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1
Wed 2017-06-14T14:20:43+1000  conductr.launcher.bundleExited                Bundle exited: host=192.168.10.1, exitValue=143 - rescheduling its execution
Wed 2017-06-14T14:20:43+1000  conductr.scaleScheduler.scaleRescheduled      Scale of 1 rescheduled
Wed 2017-06-14T14:20:43+1000  conductr.scaleScheduler.scaleBundleRequested  Scale bundle requested: scale=1
Wed 2017-06-14T14:20:43+1000  conductr.launcher.bundleStarted               Bundle started: host=192.168.10.1

Check latest bundle logs with:
  conduct logs 13eac5ec7acae4691104dfec8847a202
Current bundle logs:
TIME                          HOST                     LOG
Wed 2017-06-14T14:20:40+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:40+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:41+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:42+1000  192-168-1-5.tpgi.com.au  Sleep - 1
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Bailing out
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Component one exited with 1
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Stopping bundle
Wed 2017-06-14T14:20:43+1000  192-168-1-5.tpgi.com.au  Sleep - 1

Error: Bundle 13eac5ec7acae4691104dfec8847a202 has error

Inspect the latest bundle events and logs using:
  conduct events 13eac5ec7acae4691104dfec8847a202
  conduct logs 13eac5ec7acae4691104dfec8847a202
fsat commented 7 years ago

CLI: ensure CLI doesn't immediately terminate when waiting for bundle scale given bundle having errors

fsat commented 7 years ago

CLI exits too quickly if bundle is already in an exit state when running and stopping.