scylladb / scylla-jmx

Scylla JMX proxy
GNU Affero General Public License v3.0
29 stars 54 forks source link

nodetool repair st/et options broken #38

Open nyh opened 7 years ago

nyh commented 7 years ago

As explained in scylladb's commit f9ee74f5, scylladb expects the nodetool options "st" and "et" to generate the "startToken" and "endToken" REST API parameters, respectively. The scylladb code in repair.cc intersects this user-given range with the actual ranges held by the node.

scylla-jmx commit 4ed049739a2778b76b420d240da890b56f5e1e34 implemented new variants of repairRangeAsync() functions which in this case set the "ranges" parameter instead of the startToken/endToken parameters. This is incorrect. It should set the startToken/endToken parameters, as one pre-existing forceRepairRangeAsync() implementation does.

While at it we should:

  1. check that nothing else got broken in those new repair variants
  2. add a test for "-st"/"-et" to the dtest, so we can't break it again.
nyh commented 7 years ago

I wrote a -st/-et test and ran it against Cassandra, and it turns out that Apache Cassandra does NOT actually support the case I was complaining we no longer support! start/end token must be a subset of one "local range" (basically one vnode), and cannot be for example the entire range [-9223372036854775808, 9223372036854775807] - if you try to do that, you get the following error message:

INFO  [Thread-3] 2017-01-03 18:44:35,485 RepairRunnable.java:125 - Starting repair command #1, repairing keyspace ks with repair options (parallelism: parallel, primary range: false, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 1)
ERROR [Thread-3] 2017-01-03 18:44:35,508 RepairRunnable.java:172 - Repair failed:
java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair
        at org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:204) ~[main/:na]
        at org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:160) ~[main/:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [main/:na]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]

So our options now are:

  1. Forget the "improved" st/et options I wanted to provide and just support the weaker meaning Cassandra gave them. In this case, we should add the error message in case the user tried to do too much with st/et, the same error message which Cassandra prints above.
  2. Fix the st/et options, and/or simply the "ranges" option (we can drop the separate st/et JMX options) to be better than Cassandra and allow any token range - that will be intersected with the list of local ranges. I don't know who would use this improved option... also because I don't understand yet how people can use the weaker st/et option (how do they guess the correct token ranges?).