scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
12.96k stars 1.24k forks source link

remove function `keyspace::get_effective_replication_map()` #16626

Open nyh opened 6 months ago

nyh commented 6 months ago

Issue https://github.com/scylladb/scylladb/issues/16598 is an example where a dtest, in some fairly esoteric conditions and with tablets enabled resulted in the error

Tried to obtain per-keyspace effective replication map of ks but it's per-table

The problem happened because the code had a call to:

    find_keyspace(ks).get_effective_replication_map()

Which is no longer allowed with tablets - not all tables in the same keyspace have the same replication, and you need to use find_column_family(ks, cf).get_effective_replication_map() instead.

It is not good that we need to run a thousand dtests (currently hundreds of them fail...) to flush out all the places in the code that have such code which doesn't support tablets. So what I proposed in this patch is to remove the keyspace::get_effective_replication_map method, entirely. All code, regardless of whether it is run on tablets or not, should access the ERM through the table - not the keyspace. Any code that tries to access it through the keyspace will now fail at compile time, and must be fixed.

I realize that failing at run time was a good approach when we started working on tablets - and didn't want to fix everything at once. But now IS the time to finally fix everything - or at least make a definitive list of what still needs to be fixed. E.g., the code discovered by the aforementioned failed test (code in view_update_generator.cc) was code I didn't even know was broken until a dtest reached it.

CC @tgrabiec @mykaul

nyh commented 6 months ago

I talked about this with @tgrabiec and it turns out that in some areas of the code, instead of fixing low-level algorithms to support tablets, we have two high-level algorithms:

  1. One algorithm supports only vnodes, and may assume that it has the same ERM for all tables in a keyspace. This algorithm continues to use the old keyspace::get_effective_replication_map() function.
  2. The second algorithm supports tablets, and doesn't use that function.

To be honest, I don't understand when and why we even need the first algorithm, which supports only vnodes - isn't tablet a superset of it?

But even if we do, and even if it forces us to keep keyspace::get_effective_replication_map(), we can still rename that function and rebuild, causing compilation errors and forcing us to consider each and every use of this function and may find more issues like https://github.com/scylladb/scylladb/issues/16598 and https://github.com/scylladb/scylladb/issues/16567 - instead of waiting until we eventually find these issues (hopefully all of them...) by running dtests and seeing them fail.

bhalevy commented 4 months ago

@pwrobelse please look into this. Since we do have cases where the vnodes path requires keyspace::get_effective_replication_map, I suggest we go with the renaming approach, and rename it, manually, to keyspace::get_vnode_effective_replication_map to reflect what it returns and then audit all call sites to make sure they are indeed handling vnodes rather than tablets, and if not we should add a check if tablets are in use and call on_internal_instead since tablets handling ended up in a code path that's not ready for it. Please open an issue or find an open issue in each such case and annotate the issue with a link to the internal error(s) that need to be addressed by it.

pwrobelse commented 4 months ago

Please find the first part of analysis of the usage of keyspace::get_vnode_effective_replication_map:

  1. API: ./api/storage_service.cc -> get_natural_endpoints that uses get_natural_endpoints(). Issue: https://github.com/scylladb/scylladb/issues/17313
    1. API: ./api/storage_service.cc -> get_effective_ownership that uses effective_ownership(). Issue: https://github.com/scylladb/scylladb/issues/17342
    2. API: ./api/storage_service.cc -> get_range_to_endpoint_map that uses get_range_to_address_map(). Issue: https://github.com/scylladb/scylladb/issues/17343
    3. API: ./api/storage_service.cc -> describe_ring that uses describe_ring(). Issue: https://github.com/scylladb/scylladb/issues/16846
    4. API: ./api/storage_service.cc -> force_keyspace_cleanup. It uses is_cleanup_allowed(). Issue: no issue - the code does not execute get_vnode_effective_replication_map() when tablets are enabled. Rationale: The following message is printed: api - Keyspace testing does not support cleanup when this condition is met: rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based().
    5. API: ./api/storage_service.cc -> repair_async. It uses repair_start() -> do_repair_start(). Issue: no issue - the code does not execute locator::make_global_effective_replication_map() when tablets are enabled. Rationale: The routine detects when tablets are used and in case they are, it takes a different branch of code.
    6. API: ./api/storage_service.cc -> decommission that uses decommission(). Issue: no issue - it does not seem to be needed. Rationale: I run a small python test - it passed. From reading the code I can see that get_vnode_effective_replication_map() is called only on keyspaces returned from get_non_local_vnode_based_strategy_keyspaces(). It preserves only keyspaces for which the following condition holds rs.get_type() != locator::replication_strategy_type::local && rs.is_vnode_based().
    7. API: ./api/storage_service.cc -> rebuild that uses get_non_local_strategy_keyspaces_erms(). Issue: no issue - it does not seem to be needed. Rationale: from reading the code I can see that get_non_local_strategy_keyspaces_erms() calls get_vnode_effective_replication_map() only on keyspaces for which the following condition holds: rs.get_type() != locator::replication_strategy_type::local && !rs.is_per_table().

I have a few more places to verify in runtime, that are not related to storage service's API. I will post the second part soon.

pwrobelse commented 4 months ago
  1. API: ./api/storage_service.cc -> load_new_ss_tables that calls sstables_loader::load_new_sstables(). It may use replica::distributed_loader::process_upload_dir() that calls get_keyspace_local_ranges() that invokes get_vnode_effective_replication_map(). Issue: no issue - it does not seem to be needed. Rationale: from the code and manual execution I can see, that distributed_loader::process_upload_dir() is not called when tablets are enabled.

    1. expiration_service::run() -> scan_table() -> token_ranges_owned_by_this_shard<> -> get_vnode_effective_replication_map() Issue: https://github.com/scylladb/scylladb/issues/16567

    2. API: ./api/storage_service.cc -> cleanup_all that uses code from ./compaction/task_manager_module.cc -> global_cleanup_compaction_task_impl -> shard_cleanup_keyspace_compaction_task_impl -> table_cleanup_keyspace_compaction_task_impl -> get_keyspace_local_ranges() -> get_vnode_effective_replication_map(). Issue: no issue - it does not seem to be needed.
      Rationale: global_cleanup_compaction_task_impl performs early return and does not run shard_cleanup_keyspace_compaction_task_impl when tablets are enabled. if (replication_strategy.uses_tablets()) { // this keyspace does not support cleanup co_return; }

    3. ser::storage_service_rpc_verbs::register_raft_topology_cmd -> raft_topology_cmd_handler() -> case node_state::decommissioning -> unbootstrap() -> get_non_local_strategy_keyspaces_erms() -> get_vnode_effective_replication_map() Issue: no issue - it does not seem to be needed. Rationale: from reading the code I can see that get_non_local_strategy_keyspaces_erms() calls get_vnode_effective_replication_map() only on keyspaces for which the following condition holds: rs.get_type() != locator::replication_strategy_type::local && !rs.is_per_table().

pwrobelse commented 4 months ago

Linking another relevant issue: https://github.com/scylladb/scylladb/issues/16848

mykaul commented 1 month ago

Moving to 6.1, assuming we'll complete it by then - not critical for 6.0.

bhalevy commented 1 month ago

@pwrobelse what remains to be done here?

pwrobelse commented 1 month ago

@pwrobelse what remains to be done here?

@bhalevy - only one issue from the list remains open:

From the linked issue:

Alternator TTL doesn't work when tablets are enabled, because:

  1. It uses a per-keyspace replication map.