thibaultcha / lua-cassandra

Pure Lua driver for Apache Cassandra
https://thibaultcha.github.io/lua-cassandra
Other
98 stars 35 forks source link

C.strlen(peer.data_center) cause worker process XXX exited on signal 11 (core dumped) #91

Closed georgecao closed 7 years ago

georgecao commented 7 years ago

I am using lua-cassandra 1.1.1 with openresty

sbin/nginx -v
nginx version: openresty/1.9.15.1

And in cluster.lua file,

local function get_peer(self, host, status)
    ---<before>
    local data_center = ffi_str(peer.data_center, C.strlen(peer.data_center))
    local release_version = ffi_str(peer.release_version, C.strlen(peer.release_version))
    ---<after>
end

The code around line 103 C.strlen(peer.data_center) cause a nginx error, which says: 2017/04/05 10:07:11 [alert] 127063#0: worker process 84466 exited on signal 11 (core dumped)

I am use do return end narrow down to the code. And this happens every time.

Then I did some test, the same way I load lua-cassandra client.

location  = /the/path {
    content_by_lua_file test.lua;
}

And the content of test.lua as follow:

local ffi = require 'ffi'
local C = ffi.C
local len = C.strlen('The string for test.')
ngx.log(ngx.ERR, "Length Type: ", type(len))

This time nginx works fine, and log Length Type: cdata as expected.

Any help?

georgecao commented 7 years ago

I did another simple and isolated test, and I found that if we first execute self.shm:safe_set(key,value) then reload the nginx sbin/nginx -s reload , the error shows up.

This time, the content of test.lua as follow:

local ffi = require 'ffi'
local C = ffi.C
local ffi_cast = ffi.cast
local ffi_str = ffi.string

local M = {}

ffi.cdef [[
    size_t strlen(const char *str);

    struct peer_rec {
        uint64_t      reconn_delay;
        uint64_t      unhealthy_at;
        char         *data_center;
        char         *release_version;
    };
]]

local str_const = ffi.typeof('char *')
local rec_peer_const = ffi.typeof('const struct peer_rec*')
local rec_peer_size = ffi.sizeof('struct peer_rec')
local rec_peer_cdata = ffi.new('struct peer_rec')

local shared = ngx.shared
local dict = shared.ldict

function M:set(key, reconn_delay, unhealthy_at, data_center, release_version)
  -- host info
  rec_peer_cdata.reconn_delay = reconn_delay
  rec_peer_cdata.unhealthy_at = unhealthy_at
  rec_peer_cdata.data_center = ffi_cast(str_const, data_center)
  rec_peer_cdata.release_version = ffi_cast(str_const, release_version)
  local ok , err = dict:safe_set(key, ffi_str(rec_peer_cdata,rec_peer_size))
  if not ok then 
      print(err)
  end
end
function M:rm(key)
    dict:delete(key)
end
function M:get(key)
        local rec_v, err = dict:get(key)
    if err then
        return nil, 'Not' .. err
    elseif not rec_v then
        return nil, 'Key' .. key
    elseif type(rec_v) ~= 'string' or #rec_v ~= rec_peer_size then
        return nil, 'Bad'
    end

    ngx.log(ngx.ERR, 'key', key)
    local peer = ffi_cast(rec_peer_const, rec_v)    
--  do return end
    --local len = C.strlen(peer.data_center)
    local data_center = ffi_str(peer.data_center, C.strlen(peer.data_center))
    local release_version = ffi_str(peer.release_version, C.strlen(peer.release_version))
    local reconn_delay = tonumber(peer.reconn_delay)
    local unhealthy_at = tonumber(peer.unhealthy_at)
    return data_center, release_version, reconn_delay, unhealthy_at
end

return M

Then I can reproduce this by the following code and a sbin/nginx -s reload command :

local m = require 'test'
local key = 'the_key_does_not_matter'
local dc, rv = m:get(key)
m:set(key, 100,100,'ct','1.11')
ngx.say('{"ffi":1, "dc":"'..tostring(dc)..'"}')
thibaultcha commented 7 years ago

Hi!

Thanks for investigating this and sorry for the lack of response. I currently don't have time to look into it but will in the coming days or weeks, sorry!

swordkee commented 7 years ago

glibc-2.12-1.192.el6.x86_64 contos6.5

thibaultcha commented 7 years ago

Gave some thought to that, and I think that we should get it of this strlen call and instead, store the length of the string values we store into the C struct, just like the ngx_string type in the Nginx codebase.

I'm not sure when I'll be able to do that, but would welcome a contribution for it!

swordkee commented 7 years ago

local cdata = {} cdata.reconn_delay = reconn_delay cdata.unhealthy_at = unhealthy_at cdata.data_center = data_center cdata.release_version = release_version ok, err = self.shm:safe_set(_rec_key .. host, json.encode(cdata))

georgecao commented 7 years ago

@thibaultcha @swordkee In my local repo, I implement a serde interface to make this configurable. Currently support ffi and plain table concat. And use table concat by default, So the problem is not solved, it's just a workaround.

  return setmetatable({
    shm = shared[dict_name],
    dict_name = dict_name,
    prepared_ids = {},
    peers_opts = peers_opts,
    keyspace = opts.keyspace,
    default_port = opts.default_port or 9042,
    contact_points = opts.contact_points or { '127.0.0.1' },
    timeout_read = opts.timeout_read or 2000,
    timeout_connect = opts.timeout_connect or 1000,
    retry_on_timeout = opts.retry_on_timeout == nil and true or opts.retry_on_timeout,
    max_schema_consensus_wait = opts.max_schema_consensus_wait or 10000,
    lock_opts = lock_opts,
    logging = not opts.silent,
    lb_policy = opts.lb_policy
                or require('resty.cassandra.policies.lb.rr').new(),
    reconn_policy = opts.reconn_policy
                    or require('resty.cassandra.policies.reconnection.exp').new(1000, 60000),
    retry_policy = opts.retry_policy
                   or require('resty.cassandra.policies.retry.simple').new(3),
    serde = opts.serde
            or require('resty.cassandra.serde.ffi').new()
  }, _Cluster)
end

Use cjson is another option as @swordkee said.

And FFI struct has any advantages over table or JSON string?

swordkee commented 7 years ago

the cjson is of generality

Invalid message version. Got 4 but previous messages on this connection had version 3

local function check_peer_health(self, host, coordinator_options, retry) ... local peer, err = spawn_peer(host, self.default_port, keyspace, self.peers_opts) ngx.say(peer.protocol_version) ... end peer.protocol_version=4 True version =2.18

return setmetatable({ shm = shared[dict_name], ..... init = false, protocol_version = nil or opts.protocol_version,-- Force version number .... }, _Cluster) end

function _Cluster:refresh() .... local coordinator, err = first_coordinator(self) if not coordinator then return nil, err end

if self.protocol_version ~= nil and self.protocol_version ~= coordinator.protocol_version then
  coordinator.protocol_version = self.protocol_version
end

......

thibaultcha commented 7 years ago

This should be taken care of in #97, thanks for the report!