steve0511 / resty-redis-cluster

Openresty lua client for redis cluster.
Apache License 2.0
374 stars 134 forks source link

Using lua-native implementation for CRC Calculation #60

Closed Presto412 closed 4 years ago

Presto412 commented 4 years ago

Hi, I have some code ready that takes in changes from Kong's fork of this repository. They replaced the C based CRC implementation and replaced with xmodem.lua (https://github.com/Kong/resty-redis-cluster). Can I open a PR with those changes merged to current version here?

toredash commented 4 years ago

@Presto412 I would be interested in seeing this, I cannot approve of PRs but it would be interesting to see the performance after the change

Presto412 commented 4 years ago

@steve0511 waiting for your approval

steve0511 commented 4 years ago

@Presto412 Hi, yes please send MR. And do we have some performance comparison report between them?

Presto412 commented 4 years ago

Don't have a performance report, if you could point me to a resource, would be helpful

steve0511 commented 4 years ago

Just mean if we already have some performance test either by apache bench/Jmeter/Siege will be great. If not, we can go ahead and keep both way of CRC calculation.

toredash commented 4 years ago

@Presto412 @steve0511 I did a quick performance test with our own deployment using the C and Lua based approach to get a quick overview. Mind you this was done on my laptop and inside Docker, I understand this is not the best test-setup but the findings I got initially is enough to convince me for now.

With the Lua approach, I got roughly 1000req/s with a single Redis instance, and 1500req/s with Redis Cluster

With the C approach, I got roughly 1000req/s with a single Redis instance, and 2000req/s with Redis Cluster (~33% increase in req/s)

For our build, with our test-workload for our application, it seems the C approach is more performant.

For this to be a proper test, one should really benchmark the C vs Lua approaches inside nginx without ever connecting to a Redis instance at all, e.g. make a test-case that loops through X number of keys and measure the time it take to complete between the two solutions.

toredash commented 4 years ago

@Presto412 @steve0511 I did a new performance test with only the required bits. Results in the end of this post.

The lua code:

init_worker_by_lua_block {
  local ffi = require 'ffi'
  local xmodem = require "resty.xmodem"

  ffi.cdef [[
      int lua_redis_crc16(char *key, int keylen);
  ]]

  --load from path, otherwise we should load from LD_LIBRARY_PATH by
  --export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:your_lib_path
  local function load_shared_lib(so_name)
      local string_gmatch = string.gmatch
      local string_match = string.match
      local io_open = io.open
      local io_close = io.close

      local cpath = package.cpath

      for k, _ in string_gmatch(cpath, "[^;]+") do
          local fpath = string_match(k, "(.*/)")
          fpath = fpath .. so_name

          local f = io_open(fpath)
          if f ~= nil then
              io_close(f)
              return ffi.load(fpath)
          end
      end
  end

  local clib = load_shared_lib("librestyredisslot.so")
  if not clib then
      ngx.log(ngx.ERR, "can not load librestyredisslot library")
  end

  function redis_slot_c(str)
      return clib.lua_redis_crc16(ffi.cast("char *", str), #str)
  end
  function redis_slot_l(str)
      return xmodem.redis_crc(str)
  end
}

Then in added two locations in the nginx config:

location /c-code {
    content_by_lua_block {
          ngx.say(redis_slot_c(tostring(ngx.time())))
      }
}

location /lua-code {
    content_by_lua_block {
          ngx.say(redis_slot_l(tostring(ngx.time())))
      }
}

I then ran multiple tests to get a feeling of the performance:

# C-CODE RESULTS
$ for con in 1 2 4 8; do echo -n "$con:"; wrk -c $con -t $con -d 15s http://localhost:80/c-code 2>&1 | grep Requests/sec;done
1:Requests/sec:   4734.63
2:Requests/sec:   7929.21
4:Requests/sec:  13838.32
8:Requests/sec:  24149.86
$ for con in 1 2 4 8; do echo -n "$con:"; wrk -c $con -t $con -d 15s http://localhost:80/c-code 2>&1 | grep Requests/sec;done
1:Requests/sec:   4713.11
2:Requests/sec:   7815.48
4:Requests/sec:  14143.33
8:Requests/sec:  24907.23

# LUA-CODE RESULTS
$ for con in 1 2 4 8; do echo -n "$con:"; wrk -c $con -t $con -d 15s http://localhost:80/lua-code  2>&1 | grep Requests/sec;done
1:Requests/sec:   4596.01
2:Requests/sec:   9551.46
4:Requests/sec:  13443.95
8:Requests/sec:  23983.44
$ for con in 1 2 4 8; do echo -n "$con:"; wrk -c $con -t $con -d 15s http://localhost:80/lua-code  2>&1 | grep Requests/sec;done
1:Requests/sec:   4713.51
2:Requests/sec:   7670.21
4:Requests/sec:  15805.27
8:Requests/sec:  25586.97

Tests was run locally on a MBP 2016 / 2,7 GHz Quad-Core Intel Core i7, inside Docker.

It seems to me that the performance is nearly identical for either solutions.

Presto412 commented 4 years ago

Great, I'll add my lua-based changes then

toredash commented 4 years ago

@Presto412 Isn't this already covered in https://github.com/steve0511/resty-redis-cluster/pull/36 ?

Presto412 commented 4 years ago

Oh, okay. Can merge that itself then

toredash commented 4 years ago

@steve0511 this issue can be closed