yahoo / storm-perf-test

A simple storm performance/stress test
Other
76 stars 40 forks source link

Usage #1

Open vibha123 opened 8 years ago

vibha123 commented 8 years ago

Hi,

I cloned this and tried to run it with Storm-1.0.0. After changing the version in pom.xml, there were compile issues. I am new to storm, and working on this as a part of my research project. This benchmarking suite can actually expedite my work. Can you please tell me how to use it?

This is the change I made. "

org.apache.storm
      <artifactId>storm-core</artifactId>
      <version>**1.0.0-SNAPSHOT<**/version>
      <scope>provided</scope>
  </dependency>"

The error says, it cannot load any storm packages like OutputCollector. Am I missing any Paths to be added?

revans2 commented 8 years ago

1.0.0 changed a lot of the APIs from backtype.storm to org.apache.storm. So the test will need to be updated to use those packages. @vibha123 if that is something you want to do, go right ahead, if not I'll try and find some time soon to tackle it.

vibha123 commented 8 years ago

Thanks for your prompt reply. I will try finding the packages in the code. I will keep u posted! Thanks again!

vibha123 commented 8 years ago

It Worked! Thanks! Though, I am facing one issue when I vary the number of ackers, it starts showing negative throughput, and then after sometime, it shows throughput 0. What is the reason of this?

revans2 commented 8 years ago

The performance test here relies on storms internal metrics to report things accurately. They have a number of issues the biggest being that if a worker crashes when it is restarted the old metrics go back to 0. The throughput calculation relies on a total count of messages sent, so if a worker crashes the tool can report a negative throughput because it saw the total count go down.

vibha123 commented 8 years ago

Hi, I was running some topologies with BackPressure ON. Can you please tell me the reason behind the default values of high and low watermarks, as 0.9 and 0.4 respectively. I am observing failures with 0.9, and less number of failures with 0.7 as high watermark. I know this is not the correct place to ask this question. But, I need this for my project. Thanks in advance.

revans2 commented 8 years ago

It was a bit of shooting from the hip, with some perf testing, but nothing that extensive.  What kinds of errors are you seeing?

vibha123 commented 8 years ago

Ohkie, I was testing the backpressure mechanism. I have run some linear topologies and a Star topology. In Star, a single bolt is getting overwhelmed with tuples. Its causing lots of failures. I am guessing, its because of the following reason:

As unbounded tuples are being sent, and there is a congestion at a single bolt, it causes the high watermark to hit. Because of high emit rate, it causes lots of failures. Every time this happens (which is periodic), I can see failures in UI.

Backpressure should solve the issue of tuple failures right? So, ideally instead of Bang bang controller, emit rate should be varied, right? Please let me know if my understanding is incorrect.

revans2 commented 8 years ago

This test is to stress/overload the networking. Before automatic backpressure the test could easily cause workers to crash or get shot because they could not keep up with the load, the heap would become full, and then GC would go crazy.

With back pressure on no worker should be shot because of GC. We could see tuples timeout after being emitted, simply because pushing a topology really hard can cause the latencies to go up and hence messages can timeout.

If a worker is crashing it is something that we probably need to fix in storm. If tuples are just timing out, then I would say that it is working as designed. It is not ideal, but it is working as designed. If you turn off acking I would expect the failures to go away and your maximum throughput for a given latency to about double.

As @vibha123 pointed out a bang-bang controller is a very simple controller. That is nice because it is not that hard to write/debug in a distributed environment, but it is also not ideal in all cases. Having something that does rate limiting with feedback might be better. If someone wants to work on that and submit a patch to storm with some test results I would love to see it.

vibha123 commented 8 years ago

Yes, in many cases we can see the negative throughput, it means the worker crashed. For stress testing, we tried it with Star topology and pressurized a bolt (4 spouts sending to one bolt). The worker soon crashed and we found negative throughput. We are trying more topologies. Thanks!