scylladb / gocql

Package gocql implements a fast and robust ScyllaDB client for the Go programming language.
https://docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html
BSD 3-Clause "New" or "Revised" License
169 stars 47 forks source link

Add option to control if remote replicas should be used #202

Closed sylwiaszunejko closed 1 week ago

sylwiaszunejko commented 2 weeks ago

Previously the driver would always failover to use replicas in the remote datacenters if there was no replicas available in local_dc when using DC/RackAwareRoundRobinPolicy. There was no control over it. This is not how the other driver handle it.

In python-driver there is an additional field in the DCAwareRoundRobinPolicy named used_hosts_per_remote_dc and it is 0 by default.

In java-driver 4.x there is also similar mechanism as in python-driver, there is a field usedHostsPerRemoteDc.

In rust driver there is permit_dc_failover option in the DefaultPolicy. If it is set to true driver can use replicas in dcs other than local. By default it is false.

This PR adds a field to dc/rackAwareRR to control if dc failover is permitted. To avoid breaking changes I decided not to change the default behavior.

Refs: #201 <- this issue is about wrong dc name in DCAwareRoundRobinPolicy, but the difference in gocql behavior in that case originates in handling remote replicas by default.

sylwiaszunejko commented 2 weeks ago

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

dkropachev commented 2 weeks ago

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

@sylwiaszunejko , thansk for PR, one question:

  1. Do you consider changing this option while policy is in use ?
sylwiaszunejko commented 2 weeks ago

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

@sylwiaszunejko , thansk for PR, one question:

1. Do you consider changing this option while policy is in use ?

My intention was to specify it only at the beginning as in other drivers. I wanted to add this option to the constructor, but in golang there is no option to specify default value for an argument and I didn't want to do API breaking changes by adding extra parameter in https://github.com/scylladb/gocql/blob/master/policies.go#L960

dkropachev commented 2 weeks ago

@sylwiaszunejko , thansk for PR, one question:

1. Do you consider changing this option while policy is in use ?

My intention was to specify it only at the beginning as in other drivers. I wanted to add this option to the constructor, but in golang there is no option to specify default value for an argument and I didn't want to do API breaking changes by adding extra parameter in https://github.com/scylladb/gocql/blob/master/policies.go#L960

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change it on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)
sylwiaszunejko commented 2 weeks ago

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)

I wanted to do it that way initially, but I was wondering if changing DCAwareRoundRobinPolicy definition is what we want.

dkropachev commented 2 weeks ago

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)

I wanted to do it that way initially, but I was wondering if changing DCAwareRoundRobinPolicy definition is what we want.

It won't be breaking change, let me show you.

sylwiaszunejko commented 2 weeks ago

@dkropachev What is your opinion about changing the default behavior regarding using remote hosts?

sylwiaszunejko commented 2 weeks ago

@dkropachev something went wrong with your push, there are unnecessary commits, if I should fix something please let me know, I will fix it

dkropachev commented 2 weeks ago

@dkropachev What is your opinion about changing the default behavior regarding using remote hosts?

I would classify it as breaking change, it is better to avoid that by defaulting to old behavior.

dkropachev commented 2 weeks ago

@dkropachev something went wrong with your push, there are unnecessary commits, if I should fix something please let me know, I will fix it

I am sorry for doing it, could you please push your version of the branch again

sylwiaszunejko commented 2 weeks ago

v2:

sylwiaszunejko commented 2 weeks ago

I added a simple test

wprzytula commented 2 weeks ago

Please re-request review from me once it gets stable and not draft.

roydahan commented 2 weeks ago

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).

On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).

I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

wprzytula commented 2 weeks ago

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).

On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).

I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

Having made this decision, it will guide us in multiple similar dilemmas: to break with good intentions or to preserve old behaviour. Having chosen 1., we would avoid major releases and API-breaking changes, as well as we would preserve old defaults. With 2., we would admit that we are actually developing gocql, not only maintaining it, and so we would issue next major releases.

wprzytula commented 2 weeks ago

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

mykaul commented 2 weeks ago

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

You mean https://github.com/scylladb/scylla-go-driver ?

mykaul commented 2 weeks ago

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance). On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name). I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2. Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

Lorak-mmk commented 2 weeks ago

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance). On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name). I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2. Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

Availability? I thought the discussion was about changing the default to prevent sending requests to other DCs, as is the default in other drivers? Isn't that decreasing availability?

dkropachev commented 2 weeks ago

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

I know that upstream maintainers were also thinking of v2 and looked at scylladb/scylla-go-driver as a base for it. It would be amazing if we manage to connect with them and work on the project together.

Lorak-mmk commented 2 weeks ago

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

I know that upstream maintainers were also thinking of v2 and looked at scylladb/scylla-go-driver as a base for it. It would be amazing if we manage to connect with them and work on the project together.

It would defnitely be good to cooperate instead of having forks.

mykaul commented 2 weeks ago

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance). On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name). I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2. Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

Availability? I thought the discussion was about changing the default to prevent sending requests to other DCs, as is the default in other drivers? Isn't that decreasing availability?

Previously, there was no control for it. Now there is, and we are considering what the default should be. I'm in favor of ALLOWING sending requests to other DCs. I think the default in other drivers is wrong.

sylwiaszunejko commented 2 weeks ago

Previously, there was no control for it. Now there is, and we are considering what the default should be. I'm in favor of ALLOWING sending requests to other DCs. I think the default in other drivers is wrong.

@mykaul That's exactly what the code in this PR does so far. The default behavior is the same as it was without controlling mechanism, but we have the control available. And this option is in fact in line with @wprzytula approach of not doing breaking changes.

roydahan commented 2 weeks ago

Just to make sure, if we leave the default as is, the user would have the exact same issue as they had when pointing the wrong DC name without ability to tell.

sylwiaszunejko commented 2 weeks ago

Just to make sure, if we leave the default as is, the user would have the exact same issue as they had when pointing the wrong DC name without ability to tell.

True, to fail if the wrong DC is provided they would have to use HostPolicyOptionDisableDCFailover