scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
70 stars 42 forks source link

Add `RackAwareRoundRobinPolicy` for host selection #332

Closed sylwiaszunejko closed 2 weeks ago

sylwiaszunejko commented 2 months ago

Added RackAwareRoundRobinPolicy to prefer replicas in local rack while picking a coordinator. Similar to DCAwareRoundRobinPolicy but with extra option.

Tests are added.

Fixes: #325

sylwiaszunejko commented 2 months ago

@Lorak-mmk @fruch ping

fruch commented 2 months ago

as for the integration test with multi dc and multi-racks, I would recommend looking into similar test in dtest, and clone the needed ideas/helper functions from there.

fruch commented 2 months ago

Maybe this new policy should be the default? I think you need to modify default_lbp_factory function in cluster.py to do this (+ comment in class ExecutionProfile's load_balancing_policy attribute and perhaps some other comments). @mykaul wdyt?

what's the reason ? i.e. it would be doing the exact same thing, since no parameters are passed to it.

In general please go over all occurences of DCAwareRoundRobinPolicy in the driver code (and docs etc) and see if they need to be replaced or complemented by RackAwareRoundRobinPolicy

again, don't know about replacing, but each occurence need to be understood, and figure if there's something todo with it.

sylwiaszunejko commented 1 month ago

v2:

mykaul commented 1 month ago

Maybe this new policy should be the default? I think you need to modify default_lbp_factory function in cluster.py to do this (+ comment in class ExecutionProfile's load_balancing_policy attribute and perhaps some other comments). @mykaul wdyt?

In general please go over all occurences of DCAwareRoundRobinPolicy in the driver code (and docs etc) and see if they need to be replaced or complemented by RackAwareRoundRobinPolicy

Yes, I believe it should be the default. What is the default in other drivers?

sylwiaszunejko commented 1 month ago

Maybe this new policy should be the default? I think you need to modify default_lbp_factory function in cluster.py to do this (+ comment in class ExecutionProfile's load_balancing_policy attribute and perhaps some other comments). @mykaul wdyt? In general please go over all occurences of DCAwareRoundRobinPolicy in the driver code (and docs etc) and see if they need to be replaced or complemented by RackAwareRoundRobinPolicy

Yes, I believe it should be the default. What is the default in other drivers?

@mykaul In Rust driver rack awareness is a part of default load balancing policy, there is a field for preferred rack, if it is set rack awareness is used if not it is not used https://github.com/scylladb/scylla-rust-driver/blob/main/docs/source/load-balancing/default-policy.md#preferences In java-driver 4.x the rack-awareness feature is optional and to enable it the local rack should be supplied through the configuration. The feature is disabled by default and unlike the local datacenter it will not be implicitly fetched from the provided contact points. https://github.com/scylladb/java-driver/tree/scylla-4.x/manual/core/load_balancing#rack-awareness

mykaul commented 1 month ago

In Rust driver rack awareness is a part of default load balancing policy, there is a field for preferred rack, if it is set rack awareness is used if not it is not used https://github.com/scylladb/scylla-rust-driver/blob/main/docs/source/load-balancing/default-policy.md#preferences

This sounds like what we should support as well.

sylwiaszunejko commented 1 month ago

In Rust driver rack awareness is a part of default load balancing policy, there is a field for preferred rack, if it is set rack awareness is used if not it is not used https://github.com/scylladb/scylla-rust-driver/blob/main/docs/source/load-balancing/default-policy.md#preferences

This sounds like what we should support as well.

We can pass RackAwareRoundRobinPolicy as a child policy in TokenAwarePolicy, isn't that an equivalent of specifying field? In Rust driver DC awareness is handled by a field and in python-driver by a child policy.

If we would want to replace all occurrences of DCAware with RackAware it would not be obvious, in this function for example: https://github.com/scylladb/python-driver/blob/master/cassandra/cluster.py#L240-#L243 it will hard to use Rack aware instead of DC aware policy, because to be consistent with rust driver and java driver 4.x we decided that local_dc and local_rack have to be specified.

Lorak-mmk commented 1 month ago

Now that we decided to not do implicit dc / rack it makes no sense imo to make this policy default. Because without specifying dc / rack it won't do anything, and as @sylwiaszunejko mentioned specifying it is equivalent to using it as child policy to TokenAware.

sylwiaszunejko commented 1 month ago

I added some kind of pattern for integration tests, but I am not really sure how the tests should look like. I could use some help with that @Lorak-mmk @fruch

sylwiaszunejko commented 1 month ago

I am not sure what is going on with the CI. I cannot download full logs it keeps failing every time. Locally all tests pass

sylwiaszunejko commented 1 month ago

@Lorak-mmk @fruch ping

sylwiaszunejko commented 1 month ago

I am still fighting with the CI, this part of the code in policies.py seems to be problematic:

for host in islice(cycle(local_rack_live), pos, pos + len(local_rack_live)):
    yield host

Even though local_rack_live = [<Host: 127.0.0.2:9042 DC1>, <Host: 127.0.0.1:9042 DC1>] first host is yield two times, and because neither of these cases results in a successful read (node2 is stopped), driver uses host with not local rack. I am not sure why that happens.

sylwiaszunejko commented 1 month ago

I am still fighting with the CI, this part of the code in policies.py seems to be problematic:

for host in islice(cycle(local_rack_live), pos, pos + len(local_rack_live)):
    yield host

Even though local_rack_live = [<Host: 127.0.0.2:9042 DC1>, <Host: 127.0.0.1:9042 DC1>] first host is yield two times, and because neither of these cases results in a successful read (node2 is stopped), driver uses host with not local rack. I am not sure why that happens.

I changed this slice logic to just iterating through list and the issue stopped reproducing. @Lorak-mmk do you have any idea why?

sylwiaszunejko commented 3 weeks ago

@Lorak-mmk ping

sylwiaszunejko commented 3 weeks ago

It turns out that using islice and cycle combination results in broken iterator when there are concurent modification to the list, I checked it using this simple program:

import threading
import time
from itertools import cycle, islice

# Shared list among threads
local_rack_live = ['host1', 'host2', 'host3', 'host4', 'host5']

def remove_elements():
    """Thread function to remove elements from the list."""
    while local_rack_live:
        time.sleep(1)  # Simulate some delay
        removed_host = local_rack_live.pop(0)
        print(f"Removed: {removed_host}")

def iterate_and_yield():
    """Thread function to iterate through the list and yield elements."""
    pos = 0
    while local_rack_live:
        # Ensure pos is valid
        pos = (pos % len(local_rack_live)) if local_rack_live else 0
        for host in islice(cycle(local_rack_live), pos, pos + len(local_rack_live)):
            print(f"Yielding from list: {local_rack_live} host: {host}")
            time.sleep(1)  # Simulate some delay
        pos += 1

# Create threads
remove_thread = threading.Thread(target=remove_elements)
iterate_thread = threading.Thread(target=iterate_and_yield)

# Start threads
remove_thread.start()
iterate_thread.start()

# Wait for threads to complete
remove_thread.join()
iterate_thread.join()

The output looks like that:

Yielding from list: ['host1', 'host2', 'host3', 'host4', 'host5'] host: host1
Removed: host1
Yielding from list: ['host2', 'host3', 'host4', 'host5'] host: host3
Yielding from list: ['host2', 'host3', 'host4', 'host5'] host: host4
Removed: host2
Removed: host3
Yielding from list: ['host4', 'host5'] host: host1
Yielding from list: ['host4', 'host5'] host: host3
Removed: host4
Yielding from list: ['host5'] host: host5
Removed: host5

Clearly it doesn't work returning e.g. host1 when the list looks like [host4, host5]. Because of that we need to use copy of local_rack_live to make sure that everything is correct. In other policies there is the same islice and cycle logic and it is similarly broken; I will address that in a separate PR.

sylwiaszunejko commented 2 weeks ago

In other policies there is the same islice and cycle logic and it is similarly broken; I will address that in a separate PR.

I have to correct myself here. In other policies such as DCAwareRoundRobinPolicy hosts are stored in a tuple and when they are changes (host up/down) the value in the directory is reassigned so the tuple we get from the dict before the change is unaffected e.g. here: https://github.com/scylladb/python-driver/blob/master/cassandra/policies.py#L272. So my change from tuples to lists (and changing the list not creating a new one when host down/up) created a problem with the logic similar to other policies.

Lorak-mmk commented 2 weeks ago

In other policies there is the same islice and cycle logic and it is similarly broken; I will address that in a separate PR.

I have to correct myself here. In other policies such as DCAwareRoundRobinPolicy hosts are stored in a tuple and when they are changes (host up/down) the value in the directory is reassigned so the tuple we get from the dict before the change is unaffected e.g. here: https://github.com/scylladb/python-driver/blob/master/cassandra/policies.py#L272. So my change from tuples to lists (and changing the list not creating a new one when host down/up) created a problem with the logic similar to other policies.

This approach makes sense - copy in on_up / on_down because those are rare so can be expensive, don't copy in make_query_plan because we want it to be fast. Maybe thats why they used tuples - to prevent accidental modification.

sylwiaszunejko commented 2 weeks ago

Maybe thats why they used tuples - to prevent accidental modification.

Maybe but it was not that easy to deduct from the code before. So I guess the optimal solution will be to return to using tuples and copy in on_up / on_down, do you agree? @Lorak-mmk

Lorak-mmk commented 2 weeks ago

Maybe thats why they used tuples - to prevent accidental modification.

Maybe but it was not that easy to deduct from the code before. So I guess the optimal solution will be to return to using tuples and copy in on_up / on_down, do you agree? @Lorak-mmk

I agree. Please also verify that other policies are implemented correctly in this regard. It would also be beneficial to put a comment explaining all of this, maybe at the top of the file, so that the next person looking at this has easier time understanding.