scylladb / gocql

Package gocql implements a fast and robust ScyllaDB client for the Go programming language.
https://docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html
BSD 3-Clause "New" or "Revised" License
189 stars 59 forks source link

Check if localHost is not zero token node #333

Closed sylwiaszunejko closed 1 week ago

sylwiaszunejko commented 1 week ago

To ensure that driver handles zero-token nodes properly we need to make sure that following scenarios work as intended: datacenter2 is a zero-token datacenter target host - host you feed to NewCluster target dc - dc name you feed to DCAwareRoundRobinPolicy

  1. target host = any host from datacenter1, target dc = datacenter1. It should succeed, you should be able to execute queries
  2. target host = any host from datacenter2, target dc = datacenter1. It should succeed, you should be able to execute queries
  3. target host = any host from datacenter1, target dc = datacenter2. It should fail with error
  4. target host = any host from datacenter2, target dc = datacenter2. It should fail with error

This PR is needed for scenario 4 to work properly.

sylwiaszunejko commented 1 week ago

I would like to add a unit test to test all scenarios, but I have some issues with that. The test should look like this:

hosts := [...]*HostInfo{
    {hostId: "0", connectAddress: net.ParseIP("10.0.0.1"), dataCenter: "datacenter1", tokens: []string{"0", "1"}},
    {hostId: "1", connectAddress: net.ParseIP("10.0.0.2"), dataCenter: "datacenter2", tokens: []string{}},
}
cluster := NewCluster(string(hosts[0].connectAddress)) // 0 or 1 depenting on the actual scenerio we want to test
cluster.PoolConfig.HostSelectionPolicy = DCAwareRoundRobinPolicy("datacenter2") // datacenter1 or datacenter2

session, err := cluster.CreateSession()
assertEqual(t, "err is nil", nil, err)
session.Close()

This would work as a integration test, but not as a unit test because we need to initialize Session. The key factor to test is IsOperational function of policy https://github.com/scylladb/gocql/blob/aa6c8dcc7e568c5168d97adf2ee581bc907c646c/policies.go#L1027-L1046 and whether or not GetHosts() returns only non zero-token nodes. To do that I need to to somehow mock querySystemLocal/Peers, but use the real GetHosts() and getClusterPeerInfo(localHost) of ringDescriber function to test if the code properly omits zero-token nodes. I am not sure how to do that. @roydahan @dkropachev If you have any suggestions how to test it using unit tests that would be great, or maybe it is acceptable to have integration test?

sylwiaszunejko commented 1 week ago

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?

Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

dkropachev commented 1 week ago

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ? Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

sylwiaszunejko commented 1 week ago

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ? Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

Do you have specific scenario in mind when this would be helpful?

dkropachev commented 1 week ago

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ? Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

Do you have specific scenario in mind when this would be helpful?

Only if zero-token nodes are not getting some server events, which is unlikely, but we need to make sure it is not happening, I am worry about SCHEMA_CHANGE.