seeker89 / chaos-engineering-book

Chaos Engineering: Crash test your applications - the source code
https://www.manning.com/books/chaos-engineering?a_aid=chaos&a_bid=d3243216
47 stars 19 forks source link

Ch 2 - Breaking the services doesn't break the tests #26

Open MalcolmAnderson opened 3 years ago

MalcolmAnderson commented 3 years ago

Page 39 and the top of 40 (before implementing the fix at the bottom of page 40)

After killer-whil.sh ran, I decided to rerun run_ab.sh

The results were interesting.
While the 2 faas01_*.services were completely dead, run_ab.sh ran in a little more than 3 seconds ... with 0 failed requests. That seems like it reproduces the bug that Alice and Bob spent the entire night looking into.

Concurrency Level: 10 Time taken for tests: 3.090 seconds Complete requests: 50000 Failed requests: 0 Non-2xx responses: 50000 Total transferred: 16350000 bytes HTML transferred: 8300000 bytes Requests per second: 16183.14 [#/sec] (mean) Time per request: 0.618 [ms] (mean)

Is this an easter egg for the people following along and getting a little off script?

I see that chapter 3 goes right into "My app is slow" I kind of expected some kind of foreshadowing, "there's still a problem here, but we'll come back to it at a later chapter."

Did I find a bug, or did I find a feature.

Side note: I am LOVING this book. I'm a developer turned administrator (the paper pushing kind, not the keyboard tapping kind), but I still keep my hand in, and have a 4 node Raspberry Pi cluster that I'm starting to use to play with the network administration side of things. What this means is that every time you have be do something like "echo $?" my reaction is, "oooooh, how can I use this elsewhere?"

seeker89 commented 3 years ago

Hi @MalcolmAnderson and thanks for another issue!

First, I'm so glad that you're enjoying the book. It's super gratifying to see that the sweat that went into making it is being appreciated!

Unfortunately, instead of a clever Easter Egg, it's one of the more annoying aspects of ab - despite Failed requests being at 0, the Non-2xxx responses are at 50k (or 100% of your traffic). I'd guess that's the 504 responses from the nginx having no instances available. And that would also explain the super quick throughput.