Closed ankit--sethi closed 4 years ago
The cassandra health checks were contributed in #2064 by @jdubois. I'd be interested to know if he has any suggestions.
I would be very surprised that there is reason this request fails on a specific consistency level, could you provide some documentation, anything to support this claim?
Here's one github issue discussing an almost identical problem.
Going by what they say -- which seems right based on my knowledge of the System tables -- some Consistency Levels can never successfully execute on system tables that use LocalStrategy
.
The fix for this should be relatively straightforward -- ignore the default (or user-configured) Consistency Level set within CassandraOperations
and explicitly set it to be ONE
(or any of the workable values) for the the healthcheck.
I'm very surprised that system tables cannot have a high consistency level... In that case, they seem to be a poor choice to check the cluster health - lowering the consistency level would make the cluster look healthy, when in fact you cannot read or write... So if that's correct we would need to create a specific table in the database schema, which wouldn't be easy to use for Spring Boot (because then people would need to create it, etc).
If nobody finds a good solution, I'll check with my friends from Datastax when I'm back from holidays, in about 1 month.
@jdubois I hope you enjoyed your holidays. Unfortunately, I don't think we've found a good solution for this one in your absence. If you have a moment could you please check with your friends at Datastax and see what they would recommend?
Indeed, let me call @bguedes @clun for help!!!
@clun @bguedes If you have a few minutes, we'd be really grateful for your recommendation here.
Hi team,
Thank you @wilkinsona for the poke, I missed the last one for some reason. The system
, like any other keyspace, has some replication_factor
attribute and default is maybe 1. So, if you add nodes later on and do not increase the replication factor you will hit some errors.
Try this:
ALTER KEYSPACE system
WITH REPLICATION= {'class' : 'NetworkTopologyStrategy',
'data_center_name'_1 : 3, 'data_center_name' : 3};
I personally don't like TWO,THREE CL => seems not generic.
I would go with ALL
to ensure that all nodes are up and LOCAL_QUORUM
in more optimistic approach.
select * from system.local
is still an efficient query I would say but why not relying on the driver itself ?
This is the same for amy system-related keyspace. https://docs.datastax.com/en/security/6.7/security/secSystemKeyspace.html
Thanks very much, @clun. Unfortunately, we're not in a position to alter a keyspace and just have to rely upon what the user has configured.
select * from system.local is still an efficient query I would say but why not relying on the driver itself ?
This is intriguing. How would we go about relying on the driver itself? Is there something provided by the driver that we can call to determine Cassandra's health?
I was just made aware of this issue.
The system
keyspace is a bit special. It has a replication factor of 1 and uses a special replication strategy called LocalStrategy
. Basically this means that this keyspace is local to each node.
Concretely speaking, this means that querying that keyspace can only work with the following consistency levels: ONE
, LOCAL_ONE
, QUORUM
, LOCAL_QUORUM
, EACH_QUORUM
, ALL
. This is because the quorum for replication factor 1 is 1, so all the aforementioned levels are equivalent to ONE
with RF 1.
However TWO
and THREE
consistency levels cannot be met on that keyspace. You would get the following error:
UnavailableException: Not enough replicas available for query at consistency THREE (3 required but only 1 alive)
As a consequence, queries to system.local
MUST force the consistency level to ONE
or LOCAL_ONE
. I will see if my team can provide a fix for this quickly.
And as a side note: do not use THREE
, use QUORUM
or ALL
.
Thanks so much @adutra !! Don't hesitate to ping me
Closing in favour of PR #20709
My team has recently decided to move from a default READ consistency level of
LOCAL_QUORUM
toTHREE
. After this change, theCassandraHealthIndicator
can no longer execute the query below successfully. I'm wondering if there's a better test query that could work at all Consistency Levels?