Closed architm21 closed 7 years ago
I am going to need more information if you want me to help you. Ideally, please provide a minimal, reproducible example with a set of instructions on how you deployed your cluster, a minimal set of operations executed on it via this driver, and what are the results you are expecting. You are encouraged to use a tool like ccm to make such an example easy to reproduce.
Otherwise, I will at least need:
Thanks
version of driver : 1.0.0 (lastest from lua rocks) C : Cassandra 3.0.8.1293 | Native protocol v4 nodes : 3 code : https://github.com/architm21/thumnails/blob/master/cimage.lua errors : 1. 2016/12/05 05:53:05 [error] 8287#0: 81196 [lua] cimage.lua:79: could not retrieve images:could not refresh cluster: could not set host status in shm: nil key,
I did not look throughly at your code because I don't have the time right now, but one thing I noticed is that you are creating a new Cluster
instance on each request. That is not how this module is supposed to be used.
You are to instanciate a Cluster
once in the lifetime of your workers and then use its methods during the request/response lifecycles. Your current approach is very harmful to performance and most of all, leads to undefined behavior of this driver since it is not its intended use.
I am currently on mobile but I shall provide you with better usage examples, and update those in the documentation.
That would be very helpful. Waiting for the examples.
Ok, on a desktop now. The idea is that you instanciate a Cluster
only once throughout the lifetime of your Nginx workers. That means instanciating it in the init
or init_worker
phases. For example:
init_worker_by_lua_block {
local Cluster = require 'resty.cassandra.cluster'
local cluster, err = Cluster.new {
shm = 'cassandra', -- defined by the lua_shared_dict directive
contact_points = {'192.168.8.33', '192.168.8.11','192.168.8.60'},
keyspace = 'cordiant_images',
connect_timeout = 1000,
timeout_read = 1000,
}
if err then
-- ...
end
-- declared as a global (not ideal)
cluster_instance = cluster
}
server {
listen 8080;
content_by_lua_block {
local rows, err, cql_code = cluster_instance:execute("SELECT * FROM image_details WHERE id=?", {id})
if err then
-- ...
end
}
}
Or better, without declaring a global variable:
# nginx.conf
init_by_lua_block {
local foo = require "foo"
foo.do_init()
}
server {
listen 8080;
content_by_lua_block {
local foo = require "foo"
foo.do_content()
}
}
-- foo.lua
local Cluster = require 'resty.cassandra.cluster'
local cluster_instance
local _M = {}
function _M.do_init()
local cluster, err = Cluster.new {
shm = 'cassandra', -- defined by the lua_shared_dict directive
contact_points = {'192.168.8.33', '192.168.8.11','192.168.8.60'},
keyspace = 'cordiant_images',
connect_timeout = 1000,
timeout_read = 1000,
}
if err then
-- ...
end
cluster_instance = cluster
end
function _M.do_content()
local rows, err, cql_code = cluster_instance:execute("SELECT * FROM image_details WHERE id=?", {id})
if err then
-- ...
end
end
return _M
Both of those solutions will only create one long-lived instance of the Cluster
module, which will be much more efficient and is how this driver is supposed to be used.
thank you for the help. still getting the same error , do_content(): error getting rows : could not refresh cluster: could not set host status in shm: nil key,
If you want more help from me, I need you to provide me with a minimal and fully reproducible example of your use case, as originally asked for. I cannot know what is going on so far because I do not know which operations you are making on your cluster while your application is running.
Please provide an example in the form of a single Nginx config and/or Lua script without memcache (and with init script sfor your cluster), and using ccm to reproduce the operations you are making on your cluster.
Thank you.
And again: please provide full stack traces of the errors you are seeing.
hi,do you use "ab" test the concurrency,I tried,just get 1000 qps according to your suggestion up on that put the new cluster instance at init phases。
hi,I use "ab" test the concurrency,the qps is very low and hang up the cassandra down。
I would need more info than that. See previous comments on this issue.
2016/12/24 17:04:44 [error] 116424#0: 40493 lua tcp socket connect timed out, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:04:44 [warn] 116424#0: 40493 [lua] cluster.lua:136: set_peer_down(): [lua-cassandra] setting host at 10.10.121.151 DOWN, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800"
2016/12/24 17:14:03 [error] 128874#0: 125 [lua] cass.lua:41: do_content(): cluster execute failed,err=could not refresh cluster: no host details for 10.10.121.138, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:14:03 [notice] 128873#0: 181 [lua] test.lua:3: 333333333333, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:14:03 [error] 128874#0: *125 [lua] test.lua:6: could not retrieve users: could not refresh cluster: no host details for 10.10.121.138, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800"
2016/12/24 17:14:08 [error] 128863#0: 3 [lua] cass.lua:41: do_content(): cluster execute failed,err=could not refresh cluster: failed to acquire refresh lock: timeout, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:14:08 [error] 128863#0: 3 [lua] test.lua:6: could not retrieve users: could not refresh cluster: failed to acquire refresh lock: timeout, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800"
[error] 128880#0: 9 [lua] cass.lua:41: do_content(): cluster execute failed,err=could not refresh cluster: failed to acquire refresh lock: timeout, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:14:08 [error] 128880#0: 9 [lua] test.lua:6: could not retrieve users: could not refresh cluster: failed to acquire refresh lock: timeout, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800"
2016/12/24 17:14:34 [error] 128884#0: 2338 lua tcp socket read timed out, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800" 2016/12/24 17:14:34 [warn] 128884#0: 2338 [lua] cluster.lua:136: set_peer_down(): [lua-cassandra] setting host at 10.10.121.149 DOWN, client: 10.10.121.103, server: , request: "GET /cassandra HTTP/1.1", host: "10.10.121.103:8800"
hello,,I use "ab" test the concurrency,the qps is very low and produce a lot of errors upon,and I get the Flame Graph below
code: --test.lua local cass = require "cass" local cjson =require "cjson" local result,err,code=cass.do_content("select * from play_record.vod_play_record where partner='JS_CUCC' and mac ='00:19:f0:00:00:17' and chnname='动漫' and begintime = '2016-11-29 17:09:02';") if not result then ngx.log(ngx.ERR, 'could not retrieve users: ', err) --return ngx.exit(500) end ngx.say(cjson.encode(result))
--cass.lua local Cluster = require 'resty.cassandra.cluster'
local cluster_instance
local _M = {}
function _M.do_init() local pool={} table.insert(pool,"10.10.121.153:9042") table.insert(pool,"10.10.121.152:9042") table.insert(pool,"10.10.121.149:9042") table.insert(pool,"10.10.121.151:9042") table.insert(pool,"10.10.121.148:9042") table.insert(pool,"10.10.121.139:9042") table.insert(pool,"10.10.121.138:9042") table.insert(pool,"10.10.121.122:9042") table.insert(pool,"10.10.121.121:9042") table.insert(pool,"10.10.121.120:9042") table.insert(pool,"10.10.121.119:9042") local cluster, err = Cluster.new { shm = 'cassandra', -- defined by the lua_shared_dict directive contact_points = pool, keyspace = 'play_record', connect_timeout = 5000, timeout_read = 10000, } if err then ngx.log(ngx.ERR,"create cluster ERR=",err) ngx.exit(500) end
cluster_instance = cluster end
function _M.do_content(_qstr) if type(_qstr) ~= "string" then ngx.log(ngx.ERR,"do_content args err") end local rows, err, cql_code = cluster_instance:execute(_qstr) if err then ngx.log(ngx.ERR,"cluster execute failed,err=",err)
end
return rows,err,cql_code end
return _M
--nginx.conf http{ lua_shared_dict cassandra 10m;
init_by_lua_block { local cass= require "cass" cass.do_init() } } server {
listen 8800; default_type text/html; access_log logs/cassandra.log main; location ~ /cassandra { error_log logs/c_err.log debug;
content_by_lua_file /usr/local/openresty/nginx/cassandra/test.lua;
}
On mobile now but I will review this once I have some time on my hand. FYI, I have achieved 10k QPS in my benchmarks with prepared statements on the 1.0 version of this driver.
I also found I reload nginx,try curl get no data,and there is no one nginx log ,curl: (52) Emptyy reply from server ,I must restart the nginx .
FYI: the contact points do not expect the port number to be included, you should strip them out. Second: why are you specifying that many contact points? If all those nodes are part of the same cluster, this driver will retrieve them already. You only need to specify one or two contact points. I also hope all of those are part of the same Cassandra cluster.
Have you ever used Cassandra with any of the Datastax drivers before? This driver is built with a similar approach. There is not much I can tell beyond that according to the bits you pasted... As long as you do not provide me with a minimalistic reproducible example (that would include the C* schema, and some test data, as well as OpenResty, Cassandra and lua-cassandra versions), I cannot help you further, nor do I have the time to.
The title of this issue is "issues when ip of cassandra cluster changed.", but so far, I was not given any reproducible example about such a use-case... It would be useful if you want me to do something about it, or explain why it might not be supported.
I am facing same issue. I ued volume snapshot from one node and attached it to other node. Now I am getting "Nodes /X.X.X.X and /X.X.X.X have the same token 8256225600046861013." Also, I am unable to use nodetool, it gives error "Failed to connect to '127.0.0.1:7199' - NoSuchObjectException: 'no such object in table'."
@atmesh I don't think this error is related to this driver, sorry.
when ip of cassandra changed , the following error where on error log 1.could not refresh cluster: failed to acquire lock: timeout 2.attempt to index field 'shm' (a nil value)