shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
87 stars 18 forks source link

Benchmarking performance appears to degrade the longer they run #98

Closed johndelcastillo closed 3 years ago

johndelcastillo commented 3 years ago

I first noticed this when testing something else on a mirroring enabled cluster.

The basic summary is as we turn up the number of operations (-n), the performance degrades pretty significantly, as seen below.

~ $ sudo docker exec -ti redis redis-benchmark -p 6378 -n 100000 -t set -r 100000000 -P 8 -c 50 --threads 10 -q SET: 62427.95 requests per second

~ $ sudo docker exec -ti redis redis-benchmark -p 6378 -n 110000 -t set -r 100000000 -P 8 -c 50 --threads 10 -q SET: 46743.10 requests per second

~ $ sudo docker exec -ti redis redis-benchmark -p 6378 -n 150000 -t set -r 100000000 -P 8 -c 50 --threads 10 -q SET: 27566.50 requests per second

Environment: Running locally on 1 node of a 3 node t3.medium cluster.

Topology.yaml

sources:
  redis_prod:
    Redis: {batch_size_hint: 4, listen_addr: '0.0.0.0:6378', connection_limit: 20000,
      hard_connection_limit: true}
chain_config:
  redis_chain:
  - MPSCTee:
      behavior: IGNORE
      buffer_size: 10000
      chain:
      - QueryTypeFilter: {filter: Read}
      - Coalesce:
          max_behavior: {COUNT: 2000}
      - MPSCForwarder:
          buffer_size: 100
          async_mode: true
          timeout_micros: 10000
          chain:
          - QueryCounter: {name: DR chain}
          - PoolConnections:
              name: RedisCluster-DR-subchain
              parallelism: 256
              chain:
              - RedisCluster:
                  first_contact_points: ['34.211.224.239:6379']
  - QueryCounter: {name: Main chain}
  - PoolConnections:
      name: RedisCluster-Main-subchain
      parallelism: 512
      chain:
      - RedisCluster:
          first_contact_points: ['34.204.221.1:6379']
named_topics: {example: 10}
source_to_chain_mapping: {redis_prod: redis_chain}
johndelcastillo commented 3 years ago

Worth noting, i actually ran these tests out of order, 150k then 100k then 110k.

benbromhead commented 3 years ago

Might be worth rerunning on non-t3 clusters.

I tried reproducing locally with no luck. I'll try on AWS infra when I have some time

benbromhead commented 3 years ago

@johndelcastillo is this still an issue?

johndelcastillo commented 3 years ago

havn't checked yet sorry

benbromhead commented 3 years ago

Closing due to lack of repro. See #103 for details on benchmarking that occurs on automatically now