scylladb / scylla-jmx

Scylla JMX proxy
GNU Affero General Public License v3.0
28 stars 51 forks source link

Fix handling of table names with colon characters #227

Closed nyh closed 7 months ago

nyh commented 8 months ago

As noted in #226, nodetool breaks if a table name contains the colon character. This normally can't happen - CQL table names can only contain alphanumerics and underscores, and the DynamoDB API adds also dashes and dots - but neither allows a colon. But unfortunately, when Alternator has a secondary index (GSI or LSI), the name of the view does get a colon (it looks like "tablename:viewname"). And this confuses JMX, which addresses objects using MBean names (see https://docs.oracle.com/javase/8/docs/api/javax/management/ObjectName.html), where a name that contains a colon needs to be quoted.

This small series fixes two things: First, some internal list of ObjectName objects needs to quote the names it holds to allow the colon character. Second, we should allow the external client - namely, nodetool - to send us a quoted name.

To complete this series I already have patches to two other repositories which I'll send shortly:

  1. A patch to scylla-tools-java (i.e., nodetool) to use quoting when sending a table name with a colon to the JMX server.
  2. A dtest patch to verify that before these patches, neither "nodetool info" or "nodetool tablestats" work, and after these patches, both work.
nyh commented 8 months ago

I need help from the JMX or packaging gurus: How do we confirm that this patch doesn't break anything in the dtest suite? (it's supposed to fix a dtest which isn't yet in, but we need to confirm it doesn't break any existing test).

nyh commented 8 months ago

See also pull requests in other repositories:

  1. https://github.com/scylladb/scylla-tools-java/pull/355 - fix nodetool to be able to quote a table name with a colon in it (which the fix in the second patch in this PR can unquote).
  2. https://github.com/scylladb/scylla-dtest/pull/3738 - two dtests that reproduce the original bug, and check that it's fixed. This PR only fixes the nodetool info test, the nodetool pull request mentioned above is also needed for the nodetool tablestats test to work.
nyh commented 7 months ago

LGTM. Was surprised that you need checks before 'unquote()' or it might throw an exception, but apparently that's the API.

Even after beeing knee-deep in this code for many hours, I'm not sure I fully understood what is really happening here or why it was designed this way. Why does unquote() even need to exist? After you used quote() to create a valid ObjectName, why doesn't the getKeyProperty() method retrieve the original unquoted string - why does the caller need to call unquote()? Anyway, experiments shows that this is what happens - if you used quote() to create the ObjectName, the quote characters are saved and returned by GetKeyProperty() :-(

All I can say is that I'm happy we are planning to get of this Java code eventually...

denesb commented 7 months ago

Submodule update: scylladb/scylladb@282dc0ae6d2c0486c65d35628b5ed0b7a94f7279.

nyh commented 7 months ago

I need help from the JMX or packaging gurus: How do we confirm that this patch doesn't break anything in the dtest suite? (it's supposed to fix a dtest which isn't yet in, but we need to confirm it doesn't break any existing test).

@denesb did you do any such confirmation before merging this PR? I see now next promotion failing on nodetool tests https://jenkins.scylladb.com/job/scylla-master/job/next/6846/#showFailuresLink - might be related :-(

denesb commented 7 months ago

No, I just pushed it to next.

On Wed, 2023-11-22 at 15:18 +0000, nyh wrote:

I need help from the JMX or packaging gurus: How do we confirm that this patch doesn't break anything in the dtest suite? (it's supposed to fix a dtest which isn't yet in, but we need to confirm it doesn't break any existing test). @denesb did you do any such confirmation before merging this PR? I see now next promotion failing on nodetool tests https://jenkins.scylladb.com/job/scylla-master/job/next/6846/#showFailuresLink

  • might be related :-( — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
nyh commented 7 months ago

I can reproduce a failure of

PYTHONPATH=../scylla-ccm pytest --cassandra-dir=$HOME/scylla/build/dev nodetool_additional_test.py::TestNodetool::test_get_sstable

I'll try to debug it. The curious thing is that obviously my patch did not kill en-masse all the nodetool tests, it just broke two specific tests. I'll need to debug now why.

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

denesb commented 7 months ago

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

The submodule update is is only in next, it can simply be dequeued.

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

There is no CI for tools/* submodules, we merge to master directly. Also, there is usually no CI for submodule updates, unless the maintainer thinks the submodule update is risky, and opens a PR for CI. As you can see, the maintainer will sometimes misjudge the situation.

nyh commented 7 months ago

Sorry about the mess. Please dequeue the Scylla submodule update, and I'll send a followup PR to this PR to fix the bug (I hope I can do this quickly, I'm debugging it now).

denesb commented 7 months ago

Sorry about the mess. Please dequeue the Scylla submodule update, and I'll send a followup PR to this PR to fix the bug (I hope I can do this quickly, I'm debugging it now).

Submodule update dequeued.

mykaul commented 7 months ago

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

I suspect we'll get rid of JMX sooner than we'll have tests for it :(

nyh commented 7 months ago

Sent a followup: https://github.com/scylladb/scylla-jmx/pull/230

fruch commented 7 months ago

I can reproduce a failure of

PYTHONPATH=../scylla-ccm pytest --cassandra-dir=$HOME/scylla/build/dev nodetool_additional_test.py::TestNodetool::test_get_sstable

I'll try to debug it. The curious thing is that obviously my patch did not kill en-masse all the nodetool tests, it just broke two specific tests. I'll need to debug now why.

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

nyh commented 7 months ago

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

My problem is that because I don't do those things often, I have no idea how to do what you just said, and no idea where there is documentation now how to do it.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

Yes, this is a good thing to do before merging the submodule change. But in some sense it's a bit too late - it means we already merged the broken patch into scylla-jmx (for example), and yes, we won't have Scylla use the broken patch but it's still in the repository. But on second thought - maybe it's fine and we don't need to improve (and complicate) the process. We don't need to be fanatics on perfect code and bisectability on every small repository - maybe it's not a disaster that scylla-jmx got a buggy patch, and then a day later it got a second patch to fix it. Just as long as the buggy version isn't actually used in the Scylla parent module (and it isn't).

fruch commented 7 months ago

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

My problem is that because I don't do those things often, I have no idea how to do what you just said, and no idea where there is documentation now how to do it.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

Yes, this is a good thing to do before merging the submodule change. But in some sense it's a bit too late - it means we already merged the broken patch into scylla-jmx (for example), and yes, we won't have Scylla use the broken patch but it's still in the repository. But on second thought - maybe it's fine and we don't need to improve (and complicate) the process. We don't need to be fanatics on perfect code and bisectability on every small repository - maybe it's not a disaster that scylla-jmx got a buggy patch, and then a day later it got a second patch to fix it. Just as long as the buggy version isn't actually used in the Scylla parent module (and it isn't).

In some case I'm opening a PR pointing to my fork of the submodule, before merging, and opening a PR just for testing, and point it back to the actual repo/branch, when I confirmed it's working.

again maybe a bit convoluted and long, and can confuse the maintainers, but it get me a CI run ontop of my changes, with the least amount of effort. (but with not much control, since I can't change dtest/ccm in such run, or the exact test that would run)

denesb commented 7 months ago

Submodule update PR: https://github.com/scylladb/scylladb/pull/16146