Open nyh opened 6 months ago
I talked about this with @tgrabiec and it turns out that in some areas of the code, instead of fixing low-level algorithms to support tablets, we have two high-level algorithms:
keyspace::get_effective_replication_map()
function.To be honest, I don't understand when and why we even need the first algorithm, which supports only vnodes - isn't tablet a superset of it?
But even if we do, and even if it forces us to keep keyspace::get_effective_replication_map()
, we can still rename that function and rebuild, causing compilation errors and forcing us to consider each and every use of this function and may find more issues like https://github.com/scylladb/scylladb/issues/16598 and https://github.com/scylladb/scylladb/issues/16567 - instead of waiting until we eventually find these issues (hopefully all of them...) by running dtests and seeing them fail.
@pwrobelse please look into this.
Since we do have cases where the vnodes path requires keyspace::get_effective_replication_map
, I suggest we go with the renaming approach, and rename it, manually, to keyspace::get_vnode_effective_replication_map
to reflect what it returns and then audit all call sites to make sure they are indeed handling vnodes rather than tablets, and if not we should add a check if tablets are in use and call on_internal_instead since tablets handling ended up in a code path that's not ready for it.
Please open an issue or find an open issue in each such case and annotate the issue with a link to the internal error(s) that need to be addressed by it.
Please find the first part of analysis of the usage of keyspace::get_vnode_effective_replication_map
:
./api/storage_service.cc
-> get_natural_endpoints
that uses get_natural_endpoints()
.
Issue: https://github.com/scylladb/scylladb/issues/17313
./api/storage_service.cc -> get_effective_ownership
that uses effective_ownership()
.
Issue: https://github.com/scylladb/scylladb/issues/17342./api/storage_service.cc
-> get_range_to_endpoint_map
that uses get_range_to_address_map()
.
Issue: https://github.com/scylladb/scylladb/issues/17343./api/storage_service.cc
-> describe_ring
that uses describe_ring()
.
Issue: https://github.com/scylladb/scylladb/issues/16846./api/storage_service.cc
-> force_keyspace_cleanup
. It uses is_cleanup_allowed()
.
Issue: no issue - the code does not execute get_vnode_effective_replication_map()
when tablets are enabled.
Rationale: The following message is printed: api - Keyspace testing does not support cleanup
when this condition is met: rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based()
../api/storage_service.cc
-> repair_async
. It uses repair_start()
-> do_repair_start()
.
Issue: no issue - the code does not execute locator::make_global_effective_replication_map()
when tablets are enabled.
Rationale: The routine detects when tablets are used and in case they are, it takes a different branch of code../api/storage_service.cc
-> decommission
that uses decommission()
.
Issue: no issue - it does not seem to be needed.
Rationale: I run a small python test - it passed. From reading the code I can see that get_vnode_effective_replication_map()
is called only on keyspaces returned from get_non_local_vnode_based_strategy_keyspaces()
. It preserves only keyspaces for which the following condition holds rs.get_type() != locator::replication_strategy_type::local && rs.is_vnode_based()
../api/storage_service.cc
-> rebuild
that uses get_non_local_strategy_keyspaces_erms()
.
Issue: no issue - it does not seem to be needed.
Rationale: from reading the code I can see that get_non_local_strategy_keyspaces_erms()
calls get_vnode_effective_replication_map()
only on keyspaces for which the following condition holds: rs.get_type() != locator::replication_strategy_type::local && !rs.is_per_table()
.I have a few more places to verify in runtime, that are not related to storage service's API. I will post the second part soon.
API: ./api/storage_service.cc
-> load_new_ss_tables
that calls sstables_loader::load_new_sstables()
.
It may use replica::distributed_loader::process_upload_dir()
that calls get_keyspace_local_ranges()
that invokes get_vnode_effective_replication_map()
.
Issue: no issue - it does not seem to be needed.
Rationale: from the code and manual execution I can see, that distributed_loader::process_upload_dir()
is not called when tablets are enabled.
expiration_service::run()
-> scan_table()
-> token_ranges_owned_by_this_shard<>
-> get_vnode_effective_replication_map()
Issue: https://github.com/scylladb/scylladb/issues/16567
API: ./api/storage_service.cc
-> cleanup_all that uses code from ./compaction/task_manager_module.cc
-> global_cleanup_compaction_task_impl
-> shard_cleanup_keyspace_compaction_task_impl
-> table_cleanup_keyspace_compaction_task_impl
-> get_keyspace_local_ranges()
-> get_vnode_effective_replication_map()
.
Issue: no issue - it does not seem to be needed.
Rationale: global_cleanup_compaction_task_impl
performs early return and does not run shard_cleanup_keyspace_compaction_task_impl
when tablets are enabled.
if (replication_strategy.uses_tablets()) { // this keyspace does not support cleanup co_return; }
ser::storage_service_rpc_verbs::register_raft_topology_cmd
-> raft_topology_cmd_handler()
-> case node_state::decommissioning
-> unbootstrap()
-> get_non_local_strategy_keyspaces_erms()
-> get_vnode_effective_replication_map()
Issue: no issue - it does not seem to be needed.
Rationale: from reading the code I can see that get_non_local_strategy_keyspaces_erms()
calls get_vnode_effective_replication_map()
only on keyspaces for which the following condition holds:
rs.get_type() != locator::replication_strategy_type::local && !rs.is_per_table()
.
Linking another relevant issue: https://github.com/scylladb/scylladb/issues/16848
Moving to 6.1, assuming we'll complete it by then - not critical for 6.0.
@pwrobelse what remains to be done here?
@pwrobelse what remains to be done here?
@bhalevy - only one issue from the list remains open:
From the linked issue:
Alternator TTL doesn't work when tablets are enabled, because:
- It uses a per-keyspace replication map.
Issue https://github.com/scylladb/scylladb/issues/16598 is an example where a dtest, in some fairly esoteric conditions and with tablets enabled resulted in the error
The problem happened because the code had a call to:
Which is no longer allowed with tablets - not all tables in the same keyspace have the same replication, and you need to use
find_column_family(ks, cf).get_effective_replication_map()
instead.It is not good that we need to run a thousand dtests (currently hundreds of them fail...) to flush out all the places in the code that have such code which doesn't support tablets. So what I proposed in this patch is to remove the keyspace::get_effective_replication_map method, entirely. All code, regardless of whether it is run on tablets or not, should access the ERM through the table - not the keyspace. Any code that tries to access it through the keyspace will now fail at compile time, and must be fixed.
I realize that failing at run time was a good approach when we started working on tablets - and didn't want to fix everything at once. But now IS the time to finally fix everything - or at least make a definitive list of what still needs to be fixed. E.g., the code discovered by the aforementioned failed test (code in view_update_generator.cc) was code I didn't even know was broken until a dtest reached it.
CC @tgrabiec @mykaul