Related to #37.
Because of the default decaton.max.pending.records is conservatively low, Decaton's throughput is not very efficient by default value.
To begin from the conclusion, based on benchmark result, 10_000 seems a suitable property for default value.
Detailed Observations
CAUTION : I ran benchmark on my MacBook Pro, so the result may different from real environment.
As you can see, when increasing the number of max pending records, throughput and delivery latency increased.
There is a huge difference between 100 and 1,000.
Difference between 1,000 and 10,000 is not very large but still throughput increased almost 5,000 task/sec.
Benchmark 2 - 10,000, 100,000
I also wanted to make sure that if set max pending records to 100,000, throughput will improve or not.
So I tested and results are below.
Common benchmark settings:
--title test --tasks 200000 --warmup 100000 --runs 3
From the result, increasing the number to 100,000 did not make any performance improve.
Therefore, the suitable number for default is 10,000.
Other thoughts
Memory consumption
Increasing the default max pending records will increase memory usage.
But if we assume one task use 500B, then max memory usage for pending records will be:
500B * 10,000 tasks = 5,000,000B =~ 5MB
and this is not a problem at all.
Crush recovery
The other downside is that if the consumer (this case Decaton) crushes, we might have possibility to run a task twice because we might not able to track tasks status.
But since Decaton enables at-least-once property by default, duplicated processing problem must be handled by user side.
Because of that, this downside is considered to not a big problem.
Conclusion
From these observations, I suggest that set the default max pending records to 10000.
Of course the property must be tuned by each environment, increasing this number will be more likely to improve out of box performance.
Summary
Related to #37. Because of the default
decaton.max.pending.records
is conservatively low, Decaton's throughput is not very efficient by default value. To begin from the conclusion, based on benchmark result,10_000
seems a suitable property for default value.Detailed Observations
CAUTION : I ran benchmark on my MacBook Pro, so the result may different from real environment.
I attached 5 benchmark result files.
attached files: Benchmark_max_pending_100.txt Benchmark_max_pending_1000.txt Benchmark_max_pending_10000.txt Benchmark_max_pending_10000_2.txt Benchmark_max_pending_100000_2.txt
Benchmark 1 - 100, 1,000, 10,000
Based on these results, performance is significantly changed when increase the number of max pending records. See results below:
Common benchmark settings:
decaton.max.pending.records = 100
decaton.max.pending.records = 1000
decaton.max.pending.records = 10000
Observation
As you can see, when increasing the number of max pending records, throughput and delivery latency increased. There is a huge difference between 100 and 1,000. Difference between 1,000 and 10,000 is not very large but still throughput increased almost 5,000 task/sec.
Benchmark 2 - 10,000, 100,000
I also wanted to make sure that if set max pending records to 100,000, throughput will improve or not. So I tested and results are below.
Common benchmark settings:
decaton.max.pending.records = 10000
decaton.max.pending.records = 100000
Observation
From the result, increasing the number to 100,000 did not make any performance improve. Therefore, the suitable number for default is 10,000.
Other thoughts
Memory consumption
Increasing the default max pending records will increase memory usage. But if we assume one task use 500B, then max memory usage for pending records will be:
and this is not a problem at all.
Crush recovery
The other downside is that if the consumer (this case Decaton) crushes, we might have possibility to run a task twice because we might not able to track tasks status. But since Decaton enables at-least-once property by default, duplicated processing problem must be handled by user side. Because of that, this downside is considered to not a big problem.
Conclusion
From these observations, I suggest that set the default max pending records to 10000. Of course the property must be tuned by each environment, increasing this number will be more likely to improve out of box performance.