hello, @agentzh, I found a serious performance problem while calculating masked payload via flame graph.
According to the protocol.lua source code, I catch the TODO optimizations with string.buffer, however I haven't found the implementation in luajit api.
I used ffi string to do some optimizations as a work around and write the benchmark code below:
local bit = require("bit")
local ffi = require("ffi")
local str_char = string.char
local concat = table.concat
local byte = string.byte
local bxor = bit.bxor
local ffi_new = ffi.new
local ffi_string = ffi.string
local ok, new_tab = pcall(require, "table.new")
if not ok then
new_tab = function (narr, nrec) return {} end
end
local masking_key = 0x0f3eca1d
local payload_len = 3200
local f = io.open("./111.wav", "rb")
local payload = f:read(payload_len)
f:close()
local count = 100000
local function implement1()
local bytes = new_tab(payload_len, 0)
for i = 1, payload_len do
bytes[i] = str_char(bxor(byte(payload, i),
byte(masking_key, (i - 1) % 4 + 1)))
end
local p = concat(bytes)
return p
end
local function implement2()
local buffer = ffi_new("char[?]", payload_len)
for i = 1, payload_len do
buffer[i-1] = bxor(byte(payload, i),
byte(masking_key, (i - 1) % 4 + 1))
end
local p = ffi_string(buffer, payload_len)
return p
end
local function benchmark1()
local start_time = ngx.now()
for i = 1, count do
implement1()
end
ngx.update_time()
ngx.say("=========benchmark1 cost:", (ngx.now()-start_time) * 1000, " ms.")
end
local function benchmark2()
local start_time = ngx.now()
for i = 1, count do
implement2()
end
ngx.update_time()
ngx.say("=========benchmark2 cost:", (ngx.now()-start_time) * 1000, " ms.")
end
benchmark1()
benchmark2()
hello, @agentzh, I found a serious performance problem while calculating masked payload via flame graph. According to the protocol.lua source code, I catch the TODO optimizations with string.buffer, however I haven't found the implementation in luajit api.
I used ffi string to do some optimizations as a work around and write the benchmark code below:
run the benchmark code and get the result below:
Taking 100000 times calculating in the benckmark code, this ffi string implementation increased about 8-10 times performance.