Closed elsmorian closed 7 years ago
Hello @elsmorian ,
Perhaps Cassandra is in LOCAL_JMX
mode?
In this sample Docker Compose setup, which should be committed in the next couple of days, if not today, I had to mount these two files:
./docker/cassandra/jmxremote.access:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management/jmxremote.access
./docker/cassandra/jmxremote.password:/etc/cassandra/jmxremote.password
The contents of both of those files can be found here:
I also had to set this:
LOCAL_JMX=no
You can read more on why those settings were needed here:
Do let us know if that still doesn't fix your issue. Cheers!
Hi @joaquincasares thanks for the very helpful reply! I had done the LOCAL_JMX=no
bits, but I had not added in the reaper user to the jmxremote.access
file, thanks for the information!
Now we are up and running, just a quick Q - we are running Cassandra 2.2.8 with the default number of vnodes (256?), what number is a good starting point for the Segment Count for our first repairs via Reaper?
@elsmorian , that's great to hear! :)
A quick note: We've been recommending 32 vnodes lately to some of our larger customers in an attempt to simplify internal Cassandra operations while still allowing for the benefits that vnodes provide. However, do note that you cannot change vnodes on a running data center. Instead, if you were experiencing issues with 256 vnodes per node, you'd want to spin up a new data center with less vnodes, migrate to the new data center, then decommission the old data center with 256 vnodes.
As far as Segment Counts, I believe we would need a bit more info like:
@joaquincasares Many thanks for your response. What such issues would you attribute to having 256 nodes?
Currently we are running a slightly odd setup- 7 nodes with RF=3 in one 'production' datacenter and a single node in another datacenter we use for testing etc. Data load in the first DC is between 612.8 GB and 766.62 GB, and the single node in the other DC has a load of 1.32 TB
@elsmorian Sure thing! :)
256 nodes will put a bit more complexity around all streaming related tasks since you now have to stream 256 * num_of_nodes
different streams.
Thanks for the info! @adejanovski / @michaelsembwever do you have a recommendation for his ideal Segment Count based on the above numbers?
The number of segment will vary for each cluster. The best starting point IMHO is to have one segment per vnode in the cluster (1792 in your case), since it's the lowest number you'll be able to run : one segment can never overlap on multiple vnodes. Set a higher value only if segments fail to be repaired within 30 minutes.
You could easily see why having that many vnodes will force to have a lot of segments, each having an overhead. For example, there's the pause time related to intensity (which will by default be at least 30s), that multiplied by the number of segments gives 15h at least. 7 nodes with a RF of 3 will allow 2 segments to be repaired at the same time, bringing the overall overhead to 7h30. Using 32 vnodes would bring that down to ~1h50.
By the way, the minimum pause time (which is the repair manager loop cycle) can be modified in the yaml config file :
repairManagerSchedulingIntervalSeconds: 30
If segments are very fast to process, it could be interesting to reduce the value to 10s for example.
Closing the issue, and do not hesitate to discuss this on the Reaper ML.
Thank you both @joaquincasares @adejanovski for some excellent advice here, really helpful!
I'm running Reaper 0.6.1 from a Docker container, but when adding in a cluster from the web interface it fails with
Failed to establish JMX connection to 172.29.16.15:7199
then further on down the trace it mentionsCaused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is:
.I have checked connectivity from the Docker host to my Cassandra cluster and can telnet to the seed IP I'm giving it on port 7199 no problem, not sure why it's trying to connect to 127.0.1.1 - have I missed some configuration somewhere?