Closed denesb closed 3 months ago
We need a full dtest run with this PR, I don't know how to do that.
This PR is a 6.0 release blocker, so please give it attention.
We need a full dtest run with this PR, I don't know how to do that.
running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/
Looks like I have to fix ccm's own tests too.
Looks like I have to fix ccm's own tests too.
@denesb see #566
@denesb is the next step to update the building system of scylla and then drop
tools/jmx
submodule?
New in v2:
tests/test_config.py
, to a version, which has the fully complete native nodetool.We need a full dtest run with this PR, I don't know how to do that.
running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/
This is just the gating (and seems like there is some work still needed, not sure if in this PR)
We should also do a run with enterprise
And a full run, to flush issues with non gating tests.
New in v2:
* Update package version in `tests/test_config.py`, to a version, which has the fully complete native nodetool.
but i think it would be great to continue using the cached binary as long as cache works.
We need a full dtest run with this PR, I don't know how to do that.
running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/
This is just the gating (and seems like there is some work still needed, not sure if in this PR)
We should also do a run with enterprise
And a full run, to flush issues with non gating tests.
ah, right. rescheduling at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/212/
but i think it would be great to continue using the cached binary as long as cache works.
Not sure what do you mean, what cache?
I just discovered two new commands I didn't know before:
but i think it would be great to continue using the cached binary as long as cache works.
Not sure what do you mean, what cache?
IIUC, the github workflows try to reuse the cached binary. see https://github.com/scylladb/scylla-ccm/blob/903a2a65f70ebc9d1327c2f8af7159ef460664d7/.github/workflows/nix.yml#L26-L31 and https://github.com/scylladb/scylla-ccm/blob/903a2a65f70ebc9d1327c2f8af7159ef460664d7/.github/workflows/integration-tests.yml#L54-L72. in other words, i am trying to raise more attentions to #566 . i will apply the same change to integration-tests.yml
once i get positive feedbacks.
Analysis of failed gating dtests. I will soon prepare scylla-dtest.git
and scylla.git
branches to rerun this with.
Version we upgrade to doesn't have feature complete nodetool. Need to change the test to use a more recent version.
[FullDtest / full-split001 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-SizeTieredCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/FullDtest___full_split001___test_compaction_delete_tombstone_gc_immediate_SizeTieredCompactionStrategy_/) [FullDtest / full-split001 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/
Flaky test (very possibly related to native nodetool somehow): https://github.com/scylladb/scylladb/issues/17947
FullDtest_fullsplit001test_compaction_delete_tombstone_gc_immediateTimeWindowCompactionStrategy/)
[FullDtest / full-split001 / nodetool_additional_test.TestNodetool.test_scrub_with_multi_nodes_expect_data_rebuild](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split001___test_scrub_with_multi_nodes_expect_data_rebuild/)
Missing getsstables
command.
[FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/FullDtest___full_split001___test_change_node_ip/)
Version we upgrade to doesn't have feature complete nodetool. Need to change the test to use a more recent version.
[FullDtest / full-split002 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/
Flaky test (very possibly related to native nodetool somehow): https://github.com/scylladb/scylladb/issues/17947
FullDtest_fullsplit002test_compaction_delete_tombstone_gc_immediateTimeWindowCompactionStrategy/)
[FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_get_sstable](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split000___test_get_sstable/)
Need to implement getsstable
command.
[FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_sstable_info](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split000___test_sstable_info/)
Need to implement sstableinfo
command.
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_one_keyspace_multiple_column_families_writes_sample_and_empty_reads_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_one_keyspace_multiple_column_families_writes_sample_and_empty_reads_sample/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_all_column_families_writes_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_all_column_families_writes_only/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_writes_sample_for_10_partitions_with_100_op_and_empty_reads_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_writes_sample_for_10_partitions_with_100_op_and_empty_reads_sample/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_all_column_families_reads_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_all_column_families_reads_only/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_reads_sample_for_10_partitions_with_100_op_and_empty_writes_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_reads_sample_for_10_partitions_with_100_op_and_empty_writes_sample/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_5_paritions_write_samplers_and_empty_read_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_5_paritions_write_samplers_and_empty_read_sample/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_5_paritions_for_read_samplers_and_empty_write_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_5_paritions_for_read_samplers_and_empty_write_sample/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_3_paritions_for_write_samplers_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_3_paritions_for_write_samplers_only/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_3_paritions_for_read_samplers_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_3_paritions_for_read_samplers_only/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_param_sampler_writes_and_capacity_size](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_param_sampler_writes_and_capacity_size/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_param_sampler_read_and_capacity_size](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_param_sampler_read_and_capacity_size/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_write_into_one_paritions_to_different_rows](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_write_into_one_paritions_to_different_rows/) [FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_one_keyspace_multiple_column_families_reads_sample_and_empty_writes_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_one_keyspace_multiple_column_families_reads_sample_and_empty_writes_sample/)
tools/toppartitions.py
uses a regexp to search for running nodetool executable which doesn't match the native nodetool, regex needs minor adjustment
New in v3 - fix the tests:
get_get_scylla*version
.test_nodetool_timeout
I will have to roll back the removal of JMX, because it breaks upgrade tests. :(
New in v4:
Kicked off a new BYO: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/213/
@denesb
we are waiting for https://github.com/scylladb/scylladb/pull/17979 to be merged first, right ?
we are waiting for scylladb/scylladb#17979 to be merged first, right ?
Yes.
New in v4:
has_native_nodetool()
.@tchaikov I'm confused as to why the integration is failing. I rebased on top of https://github.com/scylladb/scylla-ccm/pull/566. Do I also need to rebase on top of https://github.com/scylladb/scylla-ccm/pull/567?
@denesb hi Botond, i am sorry for this inconvenience. the test failure is indeed related the recent change to the nix workflow. in which, i set the SCYLLA_VERSION
env var, see https://github.com/scylladb/scylla-ccm/blob/22283e1f59aa640b5e301395ead9b3dcf3ec0ff2/.github/workflows/nix.yml#L37
and in the test, we respect this env var over the setting hardwired in the code, see https://github.com/scylladb/scylla-ccm/blob/22283e1f59aa640b5e301395ead9b3dcf3ec0ff2/tests/test_config.py#L8-L9 .
so, if you want to bump up the scylla version to be tested, i'd suggest update workflows/nix.yml
as well.
Ok, updated nix.yaml
.
It would be best to have ccm always use the relocatable package and install it, and just call nodetool without worrying if it's native or not.
It would be best to have ccm always use the relocatable package and install it, and just call nodetool without worrying if it's native or not.
It kinda has to know, because CCM is also the one taking care of scylla-jmx. And when using the native nodetool, you don't need JMX. Looking forward, having CCM "know" also allows us to drop all this JMX-related code at one point in the future.
also, i would prefer keeping the capability to test without reloc package. it's much more convenient for a faster development iteration to use the local build directory for testing.
Another advantage of CCM knowing about the native nodetool: currently (before this PR), ccm would pass the JMX port to the nodetool's -p
parameter, so we had to add another sneaky parameter, which looks like a Java def, which is internally recognized and translated to --rest-api-port
(see https://github.com/scylladb/scylladb/pull/17168/commits/c2425ca1354072ec733a369edd2402b33dadae2e). With CCM knowing which nodetool it is calling to, we can revert that commit, before release, so users do not have to wonder why there is a -p
and --rest-api-port
.
It would be best to have ccm always use the relocatable package and install it, and just call nodetool without worrying if it's native or not.
It kinda has to know, because CCM is also the one taking care of scylla-jmx. And when using the native nodetool, you don't need JMX. Looking forward, having CCM "know" also allows us to drop all this JMX-related code at one point in the future.
Yes. In any case we don't want to drop ccm capabilities that we already have.
All dependencies are in, kicked off a new BYO job: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/220/
Ok, updated
nix.yaml
.
as #567 got merged, could you please update .github/workflows/integration-tests.yml
as well.once #569 gets merged, it'd be simpler to update these versions -- in a single file.
All dependencies are in, kicked off a new BYO job: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/220/
Looks like I have to fill all the repo fields, I cannot leave them empty. Now a BYO which actually got beyond the checkout phase: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/223/
Latest BYO:
FullDtest / full-split001 / nodetool_additional_test.TestNodetool.test_scrub_with_multi_nodes_expect_data_rebuild FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_get_sstable FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_sstable_info
For some reason, BYO didn't pull the latest master, the sstableinfo
and getsstables
commands are still missing, even though they are present on master.
The 4th test -- update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip -- is something else, I have to look into it.
FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip
This is another test which fails because the native nodetool is too fast. Adding a 5s sleep makes the test pass.
FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip
This is another test which fails because the native nodetool is too fast. Adding a 5s sleep makes the test pass.
New BYO with build is running: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2165/
There is one more nodetool
command missing: checkAndRepairCdcStreams
.
Rebased again, @tchaikov please check that the workflow yamls look ok.
There is one more
nodetool
command missing:checkAndRepairCdcStreams
.
New BYO, including the two above PRs: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2166/
FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip
This is another test which fails because the native nodetool is too fast.
Where's the compatibility?!
FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip
This is another test which fails because the native nodetool is too fast.
Where's the compatibility?!
yeah, they should have shelp a few sleeps in the new code 🤣
It would be best to have ccm always use the relocatable package and install it, and just call nodetool without worrying if it's native or not.
I may have to go back to this, I'm hitting issues in upgrade tests and I ended up restoring the pre-PR code completely, just invoked in an else:
branch.
I will prepare a Scylla branch which installs the native nodetool as nodetool
and lets see if that works.
I may have to discard this PR completely. At least it served as a good integration test for finding the last wrinkles to iron out in the native nodetool.
I will report back when I have results.
New in v5:
With this, CCM should be prepared for the disappearance of the java tools. Lets see how dtests will take this (I tested with a small sample, including an upgrade test, which is tricky).
BYO (with java packages removed): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2168/
BYO (with java packages removed): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2168/
That's quite ambitious
There quite a lot of tests using Cassandra-stress, you'll need to supply an alternative or in CCM or in dtest
BYO (with java packages removed): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2168/
That's quite ambitious
There quite a lot of tests using Cassandra-stress, you'll need to supply an alternative or in CCM or in dtest
Yes, it was more for curiosity and also because dropping the java packages is one of our goals for 6.0 (but we of course want to keep c-s). There are 280 failed tests already.
I already kicked off another BYO run, which runs with the java tools still in (it is in the queue).
I will hack https://github.com/scylladb/scylladb/pull/17969 to install only cassandra-stress
, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: only cassandra-stress
is part of the unified package, jmx and the rest of the java-tools content is dropped.
I will hack https://github.com/scylladb/scylladb/pull/17969 to install only
cassandra-stress
, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: onlycassandra-stress
is part of the unified package, jmx and the rest of the java-tools content is dropped.
I thought @syuu1228 was doing some work on repacking c-s on itself own, maybe you want to sync with him
I will hack scylladb/scylladb#17969 to install only
cassandra-stress
, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: onlycassandra-stress
is part of the unified package, jmx and the rest of the java-tools content is dropped.I thought @syuu1228 was doing some work on repacking c-s on itself own, maybe you want to sync with him
Yes, I will. For now I just want to see what changes I need here in ccm, to keep cassandra-stress working, but not care about the other java tools.
v6: reduce the scope of the PR from preparing for the post-java world, to preparing to the post-jmx world.
Soon the jmx package will be dropped from the unified package, and the scylla.git repo. Prepare for this:
use_jmx
property, based on probing the reloc package for the presence of thejmx
directory, or thetools/jmx
directory in the scylla repo.Fixes: https://github.com/scylladb/scylla-dtest/issues/4093