litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.39k stars 688 forks source link

Failed to get the instance tag at "EC2 Stop By Tag" #4807

Closed jongwooo closed 1 month ago

jongwooo commented 1 month ago

What happened:

Fault Summary:
TARGET_SELECTION_ERROR
{"errorCode":"TARGET_SELECTION_ERROR","phase":"PreChaos","reason":"failed to get the instance tag, invalid instance tag","target":"{EC2 Instance Tag: , Region: ap-northeast-2}"}

While running an AWS experiment using litmus helm v3.9.0, I encountered an error when executing the ec2-stop-by-tag. Despite providing a value for the EC2_INSTANCE_TAG field, the experiment failed because an empty string was passed to it.

cc. @namkyu1999

What you expected to happen: I expected the ec2-stop-by-tag experiment to stop the EC2 instance based on the provided tag value.

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible):

  1. Install Litmus using Helm with the following command:
    helm install chaos litmuschaos/litmus --namespace=litmus --create-namespace --set portal.frontend.service.type=NodePort --set mongodb.image.registry=ghcr.io/zcube --set mongodb.image.repository=bitnami-compat/mongodb --set mongodb.image.tag=6.0.5
  2. Execute the ec2-stop-by-tag with a valid EC2_INSTANCE_TAG value(e.g., stack:test).
  3. Observe that the command is passed an empty string for the EC2_INSTANCE_TAG field, causing the experiment to fail.

Anything else we need to know?:

Test Environment:

jongwooo commented 1 month ago

The fault configuration has the EC2_INSTANCE_TAG set correctly. However, during the experiment, EC2_INSTANCE_TAG is not passed. tune_fault

jongwooo commented 1 month ago

The other experiment using tags, ebs-loss-by-tag works fine.

jongwooo commented 1 month ago

The error occurs because the GetInstanceList method is passed an empty string as the instanceTag parameter.

https://github.com/litmuschaos/litmus-go/blob/master/pkg/cloud/aws/ec2/ec2-operations.go#L141-L144

jongwooo commented 1 month ago

The error occurs because the GetInstanceList method is passed an empty string as the instanceTag parameter.

This error occurred because the environment variable has been renamed, which has caused the runner to be unable to retrieve the correct value. Rather than correcting the fault configuration, I think it would be better to update the codebase to use EC2_INSTANCE_TAG like a normal fault configuration.

I'm gonna work on this issue.