litmuschaos / litmus-go

Apache License 2.0
66 stars 119 forks source link

Pod HTTP Status Code not working with Istio #622

Closed fosterchuck closed 5 months ago

fosterchuck commented 1 year ago

BUG REPORT (Possible feature if Litmus is not intended to work with Istio Service Mesh)

What happened: When Istio Service Mesh is used between microservices, the HTTP Status Code experiment appears to run and reports completion, but does not change the actual behavior of the service under test (the status code is not updated to the test value).

What you expected to happen: Litmus runner/helper to update the IPTABLES in a manner that will allow the experiment to work with Istio sidecar. Perhaps also add a way for the experiment to report an error when it isn't actually changing traffic.

How to reproduce it (as minimally and precisely as possible):

  1. Add Istio Service Mesh to at least two microservices and not the Litmus delegates, runners, helpers, etc.
  2. Ensure the first microservice (service-a) is sending constant HTTP traffic to the second microservice (service-b)
  3. To make the test easier have a single replica of service-a and service-b
  4. Start a test targeting service-b that changes the response code (we used 401)
  5. Observe either in logs of service-a, or in tracing application like APM that the response code is unchanged from what service-b provides.

Anything else we need to know?:

Feederhigh5 commented 1 year ago

Hi @fosterchuck, I am running into similar problems with my linkerd setup...

Were you able to find a workaround?

Did you only try Pod HTTP Status Code or also other network-related chaos experiments?

fosterchuck commented 1 year ago

We never resolved this. We switched to Linkerd Service Mesh and haven't had time to retest. If memory serves, this issue was present many months ago when we tested with Linkerd too. I cannot test the newer releases with BETA in their name, due to company restrictions. This is purely from memory, but I recall using network tests with no issue as long as they were tests that didn't rely on Toxi-Proxy. None of the HTTP tests worked. Sorry for the lack of specificity. This is not part of our current sprint, so I'm not at liberty to test. We will soon be revisiting this.

neelanjan00 commented 5 months ago

We fixed this issue as part of this PR: https://github.com/litmuschaos/litmus-go/pull/578