line / decaton

High throughput asynchronous task processing on Apache Kafka
Apache License 2.0
336 stars 51 forks source link

Increase max.pending.records to 10000 #142

Closed from-unknown closed 2 years ago

from-unknown commented 2 years ago

Summary

Related to #37. Because of the default decaton.max.pending.records is conservatively low, Decaton's throughput is not very efficient by default value. To begin from the conclusion, based on benchmark result, 10_000 seems a suitable property for default value.

Detailed Observations

CAUTION : I ran benchmark on my MacBook Pro, so the result may different from real environment.

Environment: MacBook Pro (16-inch, 2019)
Processor: 2.6 GHz 6-Core Intel Core i7
Memory: 16GB

I attached 5 benchmark result files.

attached files: Benchmark_max_pending_100.txt Benchmark_max_pending_1000.txt Benchmark_max_pending_10000.txt Benchmark_max_pending_10000_2.txt Benchmark_max_pending_100000_2.txt

Benchmark 1 - 100, 1,000, 10,000

Based on these results, performance is significantly changed when increase the number of max pending records. See results below:

Common benchmark settings:

--title test --tasks 20000 --warmup 10000 --runs 3

decaton.max.pending.records = 100

--- Performance ---
Execution Time(ms): 1190.33
Throughput: 19109.72 tasks/sec
Delivery Latency(ms): mean=281 max=1152

decaton.max.pending.records = 1000

--- Performance ---
Execution Time(ms): 575.00
Throughput: 35310.44 tasks/sec
Delivery Latency(ms): mean=269 max=501

decaton.max.pending.records = 10000

--- Performance ---
Execution Time(ms): 498.00
Throughput: 40348.33 tasks/sec
Delivery Latency(ms): mean=234 max=436

Observation

As you can see, when increasing the number of max pending records, throughput and delivery latency increased. There is a huge difference between 100 and 1,000. Difference between 1,000 and 10,000 is not very large but still throughput increased almost 5,000 task/sec.

Benchmark 2 - 10,000, 100,000

I also wanted to make sure that if set max pending records to 100,000, throughput will improve or not. So I tested and results are below.

Common benchmark settings:

--title test --tasks 200000 --warmup 100000 --runs 3

decaton.max.pending.records = 10000

--- Performance ---
Execution Time(ms): 4264.33
Throughput: 47047.42 tasks/sec
Delivery Latency(ms): mean=1930 max=3747

decaton.max.pending.records = 100000

--- Performance ---
Execution Time(ms): 4383.33
Throughput: 45798.08 tasks/sec
Delivery Latency(ms): mean=2091 max=4047

Observation

From the result, increasing the number to 100,000 did not make any performance improve. Therefore, the suitable number for default is 10,000.

Other thoughts

Memory consumption

Increasing the default max pending records will increase memory usage. But if we assume one task use 500B, then max memory usage for pending records will be:

500B * 10,000 tasks = 5,000,000B =~ 5MB

and this is not a problem at all.

Crush recovery

The other downside is that if the consumer (this case Decaton) crushes, we might have possibility to run a task twice because we might not able to track tasks status. But since Decaton enables at-least-once property by default, duplicated processing problem must be handled by user side. Because of that, this downside is considered to not a big problem.

Conclusion

From these observations, I suggest that set the default max pending records to 10000. Of course the property must be tuned by each environment, increasing this number will be more likely to improve out of box performance.

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

from-unknown commented 2 years ago

I was thinking to create a draft PR, but wrongly created as a normal PR. So let me close and create again 🙇