thibaultcha / lua-cassandra

Pure Lua driver for Apache Cassandra
https://thibaultcha.github.io/lua-cassandra
Other
98 stars 35 forks source link

Lock not released on error; add error logging #137

Closed chris-branch closed 3 years ago

chris-branch commented 4 years ago

I'm trying to track down the source of an error message that appears periodically in the logs for my Kong cluster (backed by Cassandra). The message looks like this:

connector.lua:269: [cassandra] failed to refresh cluster topology: timeout, context: ngx.timer

I believe this is coming from here:

https://github.com/Kong/kong/blob/master/kong/db/strategies/cassandra/connector.lua#L273

which ends up calling _Cluster:refresh() in lua-cassandra. While reviewing the code, I noticed that that _Cluster:refresh() obtains a shared lock, but the function has several exit points that do not release the lock. In particular, there are 6 places where an error return can occur where lock:unlock() will not be called. Example:

https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L583

The lock will auto-release after the 60-second timeout, but any callers may block until that happens.

Also, it would be helpful if _Cluster:refresh() could log a warning/error message for each of the early returns to help diagnose the root cause of any failures.