litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.41k stars 692 forks source link

create experiment failed #1387

Closed badashanren closed 4 years ago

badashanren commented 4 years ago

2020-03-31 07:53:41.921193 Step: [The END]: Apply the chaos result CR for pod-network-loss experiment task path: /utils/runtime/update_chaos_result_resource.yml:33 fatal: [127.0.0.1]: FAILED! => {"changed": true, "cmd": "kubectl apply -f /utils/runtime/chaos-result.yml -n ", "delta": "0:00:00.342643", "end": "2020-03-31 07:53:41.900082", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2020-03-31 07:53:41.557439", "stderr": "Error: flag needs an argument: 'n' in -n

ksatchit commented 4 years ago

@badashanren thanks for reporting this issue. Can you please elaborate/provide the following details so that we can help you get this running/fix the problem:

badashanren commented 4 years ago

@badashanren thanks for reporting this issue. Can you please elaborate/provide the following details so that we can help you get this running/fix the problem:

  • Litmus version
  • Outputs of:

    • kubectl get chaosengine <chaosengine-name> -n <your namespace> -o yaml (your chaosengine spec)
    • kubectl get chaosexperiments -n <your namespace> (experiments available in your ns)
    • kubectl describe pod <experiment pod> -n <your namespace> (env and other params passed to your exp)
    • `kubectl logs -n (basically the complete log as against the error snippet)
ansible-playbook 2.7.3
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
  executable location = /usr/local/bin/ansible-playbook
  python version = 2.7.17 (default, Nov  7 2019, 10:07:09) [GCC 7.4.0]
Using /etc/ansible/ansible.cfg as config file
/etc/ansible/hosts did not meet host_list requirements, check plugin documentation if this is unexpected
/etc/ansible/hosts did not meet script requirements, check plugin documentation if this is unexpected
statically imported: /experiments/generic/pod_network_loss/pod_network_loss_ansible_prerequisites.yml

2020-03-31 07:53:35.172737 PLAYBOOK: pod_network_loss_ansible_logic.yml 
1 plays in ./experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.yml

2020-03-31 07:53:35.234972 ************ BRACE YOURSELF, EXPERIMENT BEGINS! ************ 

2020-03-31 07:53:38.384323 Step: Gathering Facts 
task path: /experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.yml:2
META: ran handlers

2020-03-31 07:53:38.785794 Step: Identify the chaos util to be invoked 
task path: /experiments/generic/pod_network_loss/pod_network_loss_ansible_prerequisites.yml:1
details => {"changed": true, "checksum": "d4fade44014f3046d27936f97fae9366fb9fe827", "dest": "./chaosutil.yml", "gid": 0, "group": "root", "md5sum": "dce3efa87bee52e3cf15fd1e1a7d44cc", "mode": "0644", "owner": "root", "size": 56, "src": "/root/.ansible/tmp/ansible-tmp-1585641218.43-1285708157619/source", "state": "file", "uid": 0}

2020-03-31 07:53:38.831654 Step: include_vars 
task path: /experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.yml:28
details => {"ansible_facts": {"c_util": "/chaoslib/pumba/network_chaos/network_chaos.yml"}, "ansible_included_var_files": ["/experiments/generic/pod_network_loss/chaosutil.yml"], "changed": false}
included: /utils/runtime/update_chaos_result_resource.yml for 127.0.0.1

2020-03-31 07:53:39.162245 Step: [PreReq]: Generate the chaos result CR to reflect SOT (Start of Test) 
task path: /utils/runtime/update_chaos_result_resource.yml:3
details => {"changed": true, "checksum": "4edcce04ed6f9f7bc15ec3987184a37661a65597", "dest": "/utils/runtime/chaos-result.yml", "gid": 0, "group": "root", "md5sum": "2446f755323d9b516cc3aea457a84ecd", "mode": "0644", "owner": "root", "size": 267, "src": "/root/.ansible/tmp/ansible-tmp-1585641218.98-204941604955563/source", "state": "file", "uid": 0}

2020-03-31 07:53:39.761213 Step: [PreReq]: Apply the chaos result CR for pod-network-loss experiment 
task path: /utils/runtime/update_chaos_result_resource.yml:13
fatal: [127.0.0.1]: FAILED! => {"changed": true, "cmd": "kubectl apply -f /utils/runtime/chaos-result.yml -n ", "delta": "0:00:00.353071", "end": "2020-03-31 07:53:39.738645", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2020-03-31 07:53:39.385574", "stderr": "Error: flag needs an argument: 'n' in -n

Examples:
  # Apply the configuration in pod.json to a pod.
  kubectl apply -f ./pod.json

  # Apply the JSON passed into stdin to a pod.
  cat pod.json | kubectl apply -f -

  # Note: --prune is still in Alpha
  # Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are not in the file and match label app=nginx.
  kubectl apply --prune -f manifest.yaml -l app=nginx

  # Apply the configuration in manifest.yaml and delete all the other configmaps that are not in the file.
  kubectl apply --prune -f manifest.yaml --all --prune-whitelist=core/v1/ConfigMap

Available Commands:
  edit-last-applied Edit latest last-applied-configuration annotations of a resource/object
  set-last-applied  Set the last-applied-configuration annotation on a live object to match the contents of a file.
  view-last-applied View latest last-applied-configuration annotations of a resource/object

Options:
      --all=false: Select all resources in the namespace of the specified resource types.
      --allow-missing-template-keys=true: If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
      --cascade=true: If true, cascade the deletion of the resources managed by this resource (e.g. Pods created by a ReplicationController).  Default true.
      --dry-run=false: If true, only print the object that would be sent, without sending it.
  -f, --filename=[]: that contains the configuration to apply
      --force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.
      --grace-period=-1: Period of time in seconds given to the resource to terminate gracefully. Ignored if negative. Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).
      --include-uninitialized=false: If true, the kubectl command applies to uninitialized objects. If explicitly set to false, this flag overrides other flags that make the kubectl commands apply to uninitialized objects, e.g., \"--all\". Objects with empty metadata.initializers are regarded as initialized.
      --openapi-patch=true: If true, use openapi to calculate diff when the openapi presents and the resource can be found in the openapi spec. Otherwise, fall back to use baked-in types.
  -o, --output='': Output format. One of: json|yaml|name|template|go-template|go-template-file|templatefile|jsonpath|jsonpath-file.
      --overwrite=true: Automatically resolve conflicts between the modified and live configuration by using values from the modified configuration
      --prune=false: Automatically delete resource objects, including the uninitialized ones, that do not appear in the configs and are created by either apply or create --save-config. Should be used with either -l or --all.
      --prune-whitelist=[]: Overwrite the default whitelist with <group/version/kind> for --prune
      --record=false: Record current kubectl command in the resource annotation. If set to false, do not record the command. If set to true, record the command. If not set, default to updating the existing annotation value only if one already exists.
  -R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.
  -l, --selector='': Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
      --server-dry-run=false: If true, request will be sent to server with dry-run flag, which means the modifications won't be persisted. This is an alpha feature and flag.
      --template='': Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].
      --timeout=0s: The length of time to wait before giving up on a delete, zero means determine a timeout from the size of the object
      --validate=true: If true, use a schema to validate the input before sending it
      --wait=false: If true, wait for resources to be gone before returning. This waits for finalizers.

Usage:
  kubectl apply -f FILENAME [options]

Use \"kubectl <command> --help\" for more information about a given command.
Use \"kubectl options\" for a list of global command-line options (applies to all commands).

flag needs an argument: 'n' in -n", "stderr_lines": ["Error: flag needs an argument: 'n' in -n", "", "", "Examples:", "  # Apply the configuration in pod.json to a pod.", "  kubectl apply -f ./pod.json", "  ", "  # Apply the JSON passed into stdin to a pod.", "  cat pod.json | kubectl apply -f -", "  ", "  # Note: --prune is still in Alpha", "  # Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are not in the file and match label app=nginx.", "  kubectl apply --prune -f manifest.yaml -l app=nginx", "  ", "  # Apply the configuration in manifest.yaml and delete all the other configmaps that are not in the file.", "  kubectl apply --prune -f manifest.yaml --all --prune-whitelist=core/v1/ConfigMap", "", "Available Commands:", "  edit-last-applied Edit latest last-applied-configuration annotations of a resource/object", "  set-last-applied  Set the last-applied-configuration annotation on a live object to match the contents of a file.", "  view-last-applied View latest last-applied-configuration annotations of a resource/object", "", "Options:", "      --all=false: Select all resources in the namespace of the specified resource types.", "      --allow-missing-template-keys=true: If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.", "      --cascade=true: If true, cascade the deletion of the resources managed by this resource (e.g. Pods created by a ReplicationController).  Default true.", "      --dry-run=false: If true, only print the object that would be sent, without sending it.", "  -f, --filename=[]: that contains the configuration to apply", "      --force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.", "      --grace-period=-1: Period of time in seconds given to the resource to terminate gracefully. Ignored if negative. Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).", "      --include-uninitialized=false: If true, the kubectl command applies to uninitialized objects. If explicitly set to false, this flag overrides other flags that make the kubectl commands apply to uninitialized objects, e.g., \"--all\". Objects with empty metadata.initializers are regarded as initialized.", "      --openapi-patch=true: If true, use openapi to calculate diff when the openapi presents and the resource can be found in the openapi spec. Otherwise, fall back to use baked-in types.", "  -o, --output='': Output format. One of: json|yaml|name|template|go-template|go-template-file|templatefile|jsonpath|jsonpath-file.", "      --overwrite=true: Automatically resolve conflicts between the modified and live configuration by using values from the modified configuration", "      --prune=false: Automatically delete resource objects, including the uninitialized ones, that do not appear in the configs and are created by either apply or create --save-config. Should be used with either -l or --all.", "      --prune-whitelist=[]: Overwrite the default whitelist with <group/version/kind> for --prune", "      --record=false: Record current kubectl command in the resource annotation. If set to false, do not record the command. If set to true, record the command. If not set, default to updating the existing annotation value only if one already exists.", "  -R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.", "  -l, --selector='': Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)", "      --server-dry-run=false: If true, request will be sent to server with dry-run flag, which means the modifications won't be persisted. This is an alpha feature and flag.", "      --template='': Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].", "      --timeout=0s: The length of time to wait before giving up on a delete, zero means determine a timeout from the size of the object", "      --validate=true: If true, use a schema to validate the input before sending it", "      --wait=false: If true, wait for resources to be gone before returning. This waits for finalizers.", "", "Usage:", "  kubectl apply -f FILENAME [options]", "", "Use \"kubectl <command> --help\" for more information about a given command.", "Use \"kubectl options\" for a list of global command-line options (applies to all commands).", "", "flag needs an argument: 'n' in -n"], "stdout": "", "stdout_lines": []}

2020-03-31 07:53:39.806324 Step: set_fact 
task path: /experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.yml:88
details => {"ansible_facts": {"flag": "Fail"}, "changed": false}
included: /utils/runtime/getting_failure_step.yml for 127.0.0.1

2020-03-31 07:53:39.956392 Step: [Failure-Detection]: Recording the offset, on the basis of verbosity 
task path: /utils/runtime/getting_failure_step.yml:8
details => {"ansible_facts": {"offset": 2}, "changed": false}

2020-03-31 07:53:40.560620 Step: [Failure-Detection]: Getting name of failure step from experiment pod 
task path: /utils/runtime/getting_failure_step.yml:14
details => {"changed": true, "cmd": "kubectl logs  -n  | grep \"FAILED!\" -B 2 | head -1 | awk -F \"Step:\" '{print $2}'", "delta": "0:00:00.338976", "end": "2020-03-31 07:53:40.542517", "rc": 0, "start": "2020-03-31 07:53:40.203541", "stderr": "Error: flag needs an argument: 'n' in -n

Aliases:
logs, log

Examples:
  # Return snapshot logs from pod nginx with only one container
  kubectl logs nginx

  # Return snapshot logs from pod nginx with multi containers
  kubectl logs nginx --all-containers=true

  # Return snapshot logs from all containers in pods defined by label app=nginx
  kubectl logs -lapp=nginx --all-containers=true

  # Return snapshot of previous terminated ruby container logs from pod web-1
  kubectl logs -p -c ruby web-1

  # Begin streaming the logs of the ruby container in pod web-1
  kubectl logs -f -c ruby web-1

  # Display only the most recent 20 lines of output in pod nginx
  kubectl logs --tail=20 nginx

  # Show all logs from pod nginx written in the last hour
  kubectl logs --since=1h nginx

  # Return snapshot logs from first container of a job named hello
  kubectl logs job/hello

  # Return snapshot logs from container nginx-1 of a deployment named nginx
  kubectl logs deployment/nginx -c nginx-1

Options:
      --all-containers=false: Get all containers's logs in the pod(s).
  -c, --container='': Print the logs of this container
  -f, --follow=false: Specify if the logs should be streamed.
      --limit-bytes=0: Maximum bytes of logs to return. Defaults to no limit.
      --pod-running-timeout=20s: The length of time (like 5s, 2m, or 3h, higher than zero) to wait until at least one pod is running
  -p, --previous=false: If true, print the logs for the previous instance of the container in a pod if it exists.
  -l, --selector='': Selector (label query) to filter on.
      --since=0s: Only return logs newer than a relative duration like 5s, 2m, or 3h. Defaults to all logs. Only one of since-time / since may be used.
      --since-time='': Only return logs after a specific date (RFC3339). Defaults to all logs. Only one of since-time / since may be used.
      --tail=-1: Lines of recent log file to display. Defaults to -1 with no selector, showing all log lines otherwise 10, if a selector is provided.
      --timestamps=false: Include timestamps on each line in the log output

Usage:
  kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER] [options]

Use \"kubectl options\" for a list of global command-line options (applies to all commands).

flag needs an argument: 'n' in -n", "stderr_lines": ["Error: flag needs an argument: 'n' in -n", "", "", "Aliases:", "logs, log", "", "Examples:", "  # Return snapshot logs from pod nginx with only one container", "  kubectl logs nginx", "  ", "  # Return snapshot logs from pod nginx with multi containers", "  kubectl logs nginx --all-containers=true", "  ", "  # Return snapshot logs from all containers in pods defined by label app=nginx", "  kubectl logs -lapp=nginx --all-containers=true", "  ", "  # Return snapshot of previous terminated ruby container logs from pod web-1", "  kubectl logs -p -c ruby web-1", "  ", "  # Begin streaming the logs of the ruby container in pod web-1", "  kubectl logs -f -c ruby web-1", "  ", "  # Display only the most recent 20 lines of output in pod nginx", "  kubectl logs --tail=20 nginx", "  ", "  # Show all logs from pod nginx written in the last hour", "  kubectl logs --since=1h nginx", "  ", "  # Return snapshot logs from first container of a job named hello", "  kubectl logs job/hello", "  ", "  # Return snapshot logs from container nginx-1 of a deployment named nginx", "  kubectl logs deployment/nginx -c nginx-1", "", "Options:", "      --all-containers=false: Get all containers's logs in the pod(s).", "  -c, --container='': Print the logs of this container", "  -f, --follow=false: Specify if the logs should be streamed.", "      --limit-bytes=0: Maximum bytes of logs to return. Defaults to no limit.", "      --pod-running-timeout=20s: The length of time (like 5s, 2m, or 3h, higher than zero) to wait until at least one pod is running", "  -p, --previous=false: If true, print the logs for the previous instance of the container in a pod if it exists.", "  -l, --selector='': Selector (label query) to filter on.", "      --since=0s: Only return logs newer than a relative duration like 5s, 2m, or 3h. Defaults to all logs. Only one of since-time / since may be used.", "      --since-time='': Only return logs after a specific date (RFC3339). Defaults to all logs. Only one of since-time / since may be used.", "      --tail=-1: Lines of recent log file to display. Defaults to -1 with no selector, showing all log lines otherwise 10, if a selector is provided.", "      --timestamps=false: Include timestamps on each line in the log output", "", "Usage:", "  kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER] [options]", "", "Use \"kubectl options\" for a list of global command-line options (applies to all commands).", "", "flag needs an argument: 'n' in -n"], "stdout": "", "stdout_lines": []}

2020-03-31 07:53:40.615354 Step: [Failure-Detection]: Recording the name of failed step 
task path: /utils/runtime/getting_failure_step.yml:21
details => {"ansible_facts": {"failStep": ""}, "changed": false}

2020-03-31 07:53:40.666403 Step: [Failure-Detection]: Printing the name of failed step 
task path: /utils/runtime/getting_failure_step.yml:25
details => {
    "msg": "failStep: "
}
included: /utils/runtime/update_chaos_result_resource.yml for 127.0.0.1

2020-03-31 07:53:41.416221 Step: [Result]: Update the chaos result CR to reflect EOT (End of Test) 
task path: /utils/runtime/update_chaos_result_resource.yml:23
details => {"changed": true, "checksum": "8e6f0336496f729a3945fd981ff0dbb108885787", "dest": "/utils/runtime/chaos-result.yml", "gid": 0, "group": "root", "md5sum": "5b4ef093edf79e52ce377e85d709445f", "mode": "0644", "owner": "root", "size": 281, "src": "/root/.ansible/tmp/ansible-tmp-1585641220.94-197404778969050/source", "state": "file", "uid": 0}

2020-03-31 07:53:41.921193 Step: [The END]: Apply the chaos result CR for pod-network-loss experiment 
task path: /utils/runtime/update_chaos_result_resource.yml:33
fatal: [127.0.0.1]: FAILED! => {"changed": true, "cmd": "kubectl apply -f /utils/runtime/chaos-result.yml -n ", "delta": "0:00:00.342643", "end": "2020-03-31 07:53:41.900082", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2020-03-31 07:53:41.557439", "stderr": "Error: flag needs an argument: 'n' in -n

Examples:
  # Apply the configuration in pod.json to a pod.
  kubectl apply -f ./pod.json

  # Apply the JSON passed into stdin to a pod.
  cat pod.json | kubectl apply -f -

  # Note: --prune is still in Alpha
  # Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are not in the file and match label app=nginx.
  kubectl apply --prune -f manifest.yaml -l app=nginx

  # Apply the configuration in manifest.yaml and delete all the other configmaps that are not in the file.
  kubectl apply --prune -f manifest.yaml --all --prune-whitelist=core/v1/ConfigMap

Available Commands:
  edit-last-applied Edit latest last-applied-configuration annotations of a resource/object
  set-last-applied  Set the last-applied-configuration annotation on a live object to match the contents of a file.
  view-last-applied View latest last-applied-configuration annotations of a resource/object

Options:
      --all=false: Select all resources in the namespace of the specified resource types.
      --allow-missing-template-keys=true: If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
      --cascade=true: If true, cascade the deletion of the resources managed by this resource (e.g. Pods created by a ReplicationController).  Default true.
      --dry-run=false: If true, only print the object that would be sent, without sending it.
  -f, --filename=[]: that contains the configuration to apply
      --force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.
      --grace-period=-1: Period of time in seconds given to the resource to terminate gracefully. Ignored if negative. Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).
      --include-uninitialized=false: If true, the kubectl command applies to uninitialized objects. If explicitly set to false, this flag overrides other flags that make the kubectl commands apply to uninitialized objects, e.g., \"--all\". Objects with empty metadata.initializers are regarded as initialized.
      --openapi-patch=true: If true, use openapi to calculate diff when the openapi presents and the resource can be found in the openapi spec. Otherwise, fall back to use baked-in types.
  -o, --output='': Output format. One of: json|yaml|name|template|go-template|go-template-file|templatefile|jsonpath|jsonpath-file.
      --overwrite=true: Automatically resolve conflicts between the modified and live configuration by using values from the modified configuration
      --prune=false: Automatically delete resource objects, including the uninitialized ones, that do not appear in the configs and are created by either apply or create --save-config. Should be used with either -l or --all.
      --prune-whitelist=[]: Overwrite the default whitelist with <group/version/kind> for --prune
      --record=false: Record current kubectl command in the resource annotation. If set to false, do not record the command. If set to true, record the command. If not set, default to updating the existing annotation value only if one already exists.
  -R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.
  -l, --selector='': Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
      --server-dry-run=false: If true, request will be sent to server with dry-run flag, which means the modifications won't be persisted. This is an alpha feature and flag.
      --template='': Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].
      --timeout=0s: The length of time to wait before giving up on a delete, zero means determine a timeout from the size of the object
      --validate=true: If true, use a schema to validate the input before sending it
      --wait=false: If true, wait for resources to be gone before returning. This waits for finalizers.

Usage:
  kubectl apply -f FILENAME [options]

Use \"kubectl <command> --help\" for more information about a given command.
Use \"kubectl options\" for a list of global command-line options (applies to all commands).

flag needs an argument: 'n' in -n", "stderr_lines": ["Error: flag needs an argument: 'n' in -n", "", "", "Examples:", "  # Apply the configuration in pod.json to a pod.", "  kubectl apply -f ./pod.json", "  ", "  # Apply the JSON passed into stdin to a pod.", "  cat pod.json | kubectl apply -f -", "  ", "  # Note: --prune is still in Alpha", "  # Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are not in the file and match label app=nginx.", "  kubectl apply --prune -f manifest.yaml -l app=nginx", "  ", "  # Apply the configuration in manifest.yaml and delete all the other configmaps that are not in the file.", "  kubectl apply --prune -f manifest.yaml --all --prune-whitelist=core/v1/ConfigMap", "", "Available Commands:", "  edit-last-applied Edit latest last-applied-configuration annotations of a resource/object", "  set-last-applied  Set the last-applied-configuration annotation on a live object to match the contents of a file.", "  view-last-applied View latest last-applied-configuration annotations of a resource/object", "", "Options:", "      --all=false: Select all resources in the namespace of the specified resource types.", "      --allow-missing-template-keys=true: If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.", "      --cascade=true: If true, cascade the deletion of the resources managed by this resource (e.g. Pods created by a ReplicationController).  Default true.", "      --dry-run=false: If true, only print the object that would be sent, without sending it.", "  -f, --filename=[]: that contains the configuration to apply", "      --force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.", "      --grace-period=-1: Period of time in seconds given to the resource to terminate gracefully. Ignored if negative. Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).", "      --include-uninitialized=false: If true, the kubectl command applies to uninitialized objects. If explicitly set to false, this flag overrides other flags that make the kubectl commands apply to uninitialized objects, e.g., \"--all\". Objects with empty metadata.initializers are regarded as initialized.", "      --openapi-patch=true: If true, use openapi to calculate diff when the openapi presents and the resource can be found in the openapi spec. Otherwise, fall back to use baked-in types.", "  -o, --output='': Output format. One of: json|yaml|name|template|go-template|go-template-file|templatefile|jsonpath|jsonpath-file.", "      --overwrite=true: Automatically resolve conflicts between the modified and live configuration by using values from the modified configuration", "      --prune=false: Automatically delete resource objects, including the uninitialized ones, that do not appear in the configs and are created by either apply or create --save-config. Should be used with either -l or --all.", "      --prune-whitelist=[]: Overwrite the default whitelist with <group/version/kind> for --prune", "      --record=false: Record current kubectl command in the resource annotation. If set to false, do not record the command. If set to true, record the command. If not set, default to updating the existing annotation value only if one already exists.", "  -R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.", "  -l, --selector='': Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)", "      --server-dry-run=false: If true, request will be sent to server with dry-run flag, which means the modifications won't be persisted. This is an alpha feature and flag.", "      --template='': Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].", "      --timeout=0s: The length of time to wait before giving up on a delete, zero means determine a timeout from the size of the object", "      --validate=true: If true, use a schema to validate the input before sending it", "      --wait=false: If true, wait for resources to be gone before returning. This waits for finalizers.", "", "Usage:", "  kubectl apply -f FILENAME [options]", "", "Use \"kubectl <command> --help\" for more information about a given command.", "Use \"kubectl options\" for a list of global command-line options (applies to all commands).", "", "flag needs an argument: 'n' in -n"], "stdout": "", "stdout_lines": []}
    to retry, use: --limit @/experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.retry

2020-03-31 07:53:41.923791 ************ RELAX, EXPERIMENT ENDS! ************ 
127.0.0.1                  : ok=13   changed=4    unreachable=0    failed=2   
ksatchit commented 4 years ago

Thanks @badashanren ! Here are my observations from the first look:

It would be worthwhile to see what values have been passed to the exp pod by the runner - which will help us identify the issue. It will really help if you add these 2 outputs:

If these are disappearing too quickly before you can collect this, pls make .spec.jobCleanupPolicy: retain in chaosengine spec.

badashanren commented 4 years ago

Thanks @badashanren ! Here are my observations from the first look:

  • The chaosexperiment CRs are not available in the default ns which is where nginx app w/ run=mynginx label is created. We need the pod-delete experiment CR there from [here](kubectl apply -f https://hub.litmuschaos.io/api/chaos?file=charts/generic/pod-delete/experiment.yaml).
  • Any kubectl command inside the exp pod seems to fail with invalid namespace value (error for invalid value after -n -- which is why the kubectl apply help is getting printed.

It would be worthwhile to see what values have been passed to the exp pod by the runner - which will help us identify the issue. It will really help if you add these 2 outputs:

  • kubectl describe nginx-chaos-runner
  • kubectl describe <experiment-pod>

If these are disappearing too quickly before you can collect this, pls make .spec.jobCleanupPolicy: retain in chaosengine spec.

kubectl get chaosexperiments NAME AGE container-kill 2m29s disk-fill 2m30s disk-loss 2m30s node-cpu-hog 2m29s node-drain 2m31s node-memory-hog 2m29s pod-cpu-hog 2m30s pod-delete 2m30s pod-network-corruption 2m30s pod-network-latency 2m30s pod-network-loss 2m30s

kubectl describe nginx-chaos-runner sorry, what is nginx-chaos-runner? It's some resources?I can't find it~

kubectl describe ,output: Name: pod-network-loss-ktwrf-5nfw9 Namespace: default Node: 10.0.0.202/10.0.0.202 Start Time: Tue, 31 Mar 2020 17:15:35 +0800 Labels: controller-uid=9bb28634-048b-4979-a604-37325b8e4b2b experiment=pod-network-loss job-name=pod-network-loss-ktwrf Annotations: tke.cloud.tencent.com/networks-status: [{ "name": "tke-bridge", "ips": [ "172.16.0.44" ], "default": true, "dns": {} }] Status: Succeeded IP: 172.16.0.44 IPs: IP: 172.16.0.44 Controlled By: Job/pod-network-loss-ktwrf Containers: ansibletest: Container ID: docker://21969cac8ff049a4ae688720f6b1237c1e65f57623966b8d9d1eef18158b5f9d Image: litmuschaos/ansible-runner:ci Image ID: docker-pullable://litmuschaos/ansible-runner@sha256:20ed0263bf8cf68f2c90c2e128b92f38309edbd52c75921d88f9bbb4bc9907d4 Port: Host Port: Command: /bin/bash Args: -c ansible-playbook ./experiments/generic/pod_network_loss/pod_network_loss_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0 State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 31 Mar 2020 17:15:37 +0800 Finished: Tue, 31 Mar 2020 17:15:49 +0800 Ready: False Restart Count: 0 Environment: ANSIBLE_STDOUT_CALLBACK: default APP_NAMESPACE: default APP_LABEL: run=mynginx APP_KIND: TARGET_CONTAINER: NETWORK_INTERFACE: eth0 NETWORK_PACKET_LOSS_PERCENTAGE: 70 TOTAL_CHAOS_DURATION: 1800 RAMP_TIME: LIB: pumba LIB_IMAGE: gaiaadm/pumba:0.6.5 CHAOS_SERVICE_ACCOUNT: (v1:spec.serviceAccountName) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from litmus-token-rzsq6 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: litmus-token-rzsq6: Type: Secret (a volume populated by a Secret) SecretName: litmus-token-rzsq6 Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: Events: Type Reason Age From Message


Normal Scheduled 117s default-scheduler Successfully assigned default/pod-network-loss-ktwrf-5nfw9 to 10.0.0.202 Normal Pulling 116s kubelet, 10.0.0.202 Pulling image "litmuschaos/ansible-runner:ci" Normal Pulled 115s kubelet, 10.0.0.202 Successfully pulled image "litmuschaos/ansible-runner:ci" Normal Created 115s kubelet, 10.0.0.202 Created container ansibletest Normal Started 115s kubelet, 10.0.0.202 Started container ansibletest

ksatchit commented 4 years ago

Hey @badashanren are you still facing the same error / does the earlier log hold ? I see that you have successful runs of some experiments like pod-delete / network-delays etc., in the edited describe output of the chaosengine?

ksatchit commented 4 years ago

Meanwhile, please feel free to hop on to the community google hangouts here to have a chat or join our slack channel (#litmus) on Kubernetes workspace to start/join discussions!

badashanren commented 4 years ago

are you still facing the same error / does the earlier log hold ? I see that you have successful runs of some experiments like pod-delete / network-delays etc., in the edited describe output of the chaosengine

No,it still failed! The experiment like pod-delete / network-loss is launched kubectl logs ,We found the error log. It can't get "namespace", Due to permission ? image

image image

ksatchit commented 4 years ago

Yes, you are correct. The namespace is invalid. Having said that there are a couple of missing items in the describe output provided earlier / as well as the chaosengine. In the snippet you have shared, the ENVs looked like this:

APP_NAMESPACE: default
APP_LABEL: run=mynginx
APP_KIND:
TARGET_CONTAINER:
NETWORK_INTERFACE: eth0
NETWORK_PACKET_LOSS_PERCENTAGE: 70
TOTAL_CHAOS_DURATION: 1800
RAMP_TIME:

Typically we expect the .spec.appinfo.appkind field in the chaosengine to be filled with the appropriate type (deployment/statefulset/daemonset). Also, the target container needs to be provided in the ENV override section. For ex:

# chaosengine.yaml
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-network-chaos
  namespace: default
spec:
  # It can be delete/retain
  jobCleanUpPolicy: 'delete'
  # It can be true/false
  annotationCheck: 'true'
  # It can be active/stop
  engineState: 'active'
  #ex. values: ns1:name=percona,ns2:run=nginx 
  auxiliaryAppInfo: ''
  monitoring: false
  appinfo: 
    appns: 'default'
    # FYI, To see app label, apply kubectl get pods --show-labels
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-network-loss-sa 
  experiments:
    - name: pod-network-loss
      spec:
        components:
          env:
            #Container name where chaos has to be injected              
            - name: TARGET_CONTAINER
              value: 'nginx' 

            - name: LIB_IMAGE
              value: 'gaiaadm/pumba:0.6.5'

            #Network interface inside target container
            - name: NETWORK_INTERFACE
              value: 'eth0'    

            - name: NETWORK_PACKET_LOSS_PERCENTAGE
              value: '100'

            - name: TOTAL_CHAOS_DURATION
              value: '60' # in seconds

In the previous comment, the "nginx-chaos-runner" I mentioned was the pod that gets launched immediately on creating the chaosengine. It bears the name "<chaosengine-name>-runner" - The logs of this will help us to find out why the experiment is using invalid namespace etc.,

By default we delete this pod and the experiment pod immediately on completion. However in the spec.jobCleanupPolicy of chaosengine - we can use value retain to keep these pods after completion to help us troubleshoot.

ksatchit commented 4 years ago

Ah, now I get it ! I suppose you are using an example from here: https://github.com/litmuschaos/chaos-operator/blob/master/deploy/crds/chaosengine.yaml for your evaluation!

My Apologies, this is an old version which has undergone several changes since. Please refer to the https://docs.litmuschaos.io to find the latest specs and steps to run the experiments. Let me create an issue to fix it

BTW, we are still available on the hangout link I mentioned if you would like a quick chat about this/debug live!

badashanren commented 4 years ago

Ah, now I get it ! I suppose you are using an example from here: https://github.com/litmuschaos/chaos-operator/blob/master/deploy/crds/chaosengine.yaml for your evaluation!

My Apologies, this is an old version which has undergone several changes since. Please refer to the https://docs.litmuschaos.io to find the latest specs and steps to run the experiments. Let me create an issue to fix it

BTW, we are still available on the hangout link I mentioned if you would like a quick chat about this/debug live!

It works,thanks much~

ksatchit commented 4 years ago

Thank you @badashanren for trying this out. While we will get better w/ our docs & instructions (as pointed out by this issue) - it would be great if you could share some feedback/thoughts around litmus in general - fixes/improvements/enhancements you would like to see in the project to help you better. Here is the roadmap for your reference.