Closed mykaul closed 1 year ago
We can (should?) also change:
SUPPORTED_VERSIONS = (66, 65, 6, 5, 4, 3, 2, 1)
To what we actually support.
We can change the order in which protocols are tried, I'll try to get to it soon.
However, I wanted to note that this doesn't really matter if there aren't many connections (like when using clqsh). Performance of connecting will matter if there are a lot of connections made at the same time.
And it that case, protocol_version
should be explicitly set when constructing Cluster
object instead of relying on driver guessing the version - if our tests or clients don't do it, they should be instructed to.
If someone for some reason needs connection setup performance while using cqlsh - it has a --protocol-version
parameter which should be used in that case.
We can (should?) also change:
SUPPORTED_VERSIONS = (66, 65, 6, 5, 4, 3, 2, 1)
To what we actually support.
We only test protocol v4 in CI, but I think all the others should be supported by driver (I may be wrong here) - which ones do you think should be removed from this list?
We (ScyllaDB) do not support version >4, but I reckon you raise an interesting argument - what if someone wants to use our driver against both ScyllaDB and some DSE version... I don't think we test or should focus on the usecase. I prefer faster connection out of the box.
We can change the order in which protocols are tried, I'll try to get to it soon. However, I wanted to note that this doesn't really matter if there aren't many connections (like when using clqsh). Performance of connecting will matter if there are a lot of connections made at the same time. And it that case,
protocol_version
should be explicitly set when constructingCluster
object instead of relying on driver guessing the version - if our tests or clients don't do it, they should be instructed to. If someone for some reason needs connection setup performance while using cqlsh - it has a--protocol-version
parameter which should be used in that case.
Yes, I was lazy so used cqlsh, but the issue is in the Python driver (and perhaps others?) and I'm looking at the impact when hundreds or thousands of connection attempts are made. The argument about specifying the exact protocol version is valid, but assumes someone takes this into consideration in the first place.
An easy solution is put the version that Scylla supports (4) as the first one in the long list.
But this change may break a guarantee made in the driver documentation https://docs.datastax.com/en/developer/python-driver/3.25/api/cassandra/cluster/:
protocol_version = 66 - The maximum version of the native protocol to use. If not set in the constructor, the driver will automatically downgrade version based on a negotiation with the server, but it is most efficient to set this to the maximum supported by your version of Cassandra. Setting this will also prevent conflicting versions negotiated if your cluster is upgraded.
This text suggests that the driver defaults to trying version 66 and then downgrading. If it starts with 4, it will never try anything higher. This text suggest that it's the user's responsibility to set the protocol_version
parameter in the cluster (and as @Lorak-mmk said, we indeed do this in test/cql-pytest/util.py and other places).
We could, of course, change this documentation to explain we actually always use 4, and if you want a higher protocol, you should set it manually.
By the way, we have Scylla issues on how Scylla handles any protocol versions except 4 incorrectly :( See https://github.com/scylladb/scylladb/issues/9791 and https://github.com/scylladb/scylladb/issues/11899
This text suggests that the driver defaults to trying version 66 and then downgrading. If it starts with 4, it will never try anything higher. This text suggest that it's the user's responsibility to set the
protocol_version
parameter in the cluster (and as @Lorak-mmk said, we indeed do this in test/cql-pytest/util.py and other places).
The above is not our driver's text. There's a good open question about ScyllaDB driver support for Cassandra latest versions. I assume 'not tested' (certainly not for their enterprise versions!) is an accurate statement, though in reality we are probably fine...
The above is not our driver's text.
There's a separate argument why we maintain our own driver with random improvements (and not just Scylla-specific features), and whether Scylla users must use the Scylla verision of the driver for every language (do we even have a Scylla version of every language?). I guess this issue is not the place to hold this discussion.
There's a good open question about ScyllaDB driver support for Cassandra latest versions. I assume 'not tested' (certainly not for their enterprise versions!) is an accurate statement, though in reality we are probably fine...
I don't think anybody will seriously consider using Scylla's driver for running against Cassandra, so the answer to this question is not important.
The above is not our driver's text. There's a good open question about ScyllaDB driver support for Cassandra latest versions. I assume 'not tested' (certainly not for their enterprise versions!) is an accurate statement, though in reality we are probably fine...
I don't think anybody will seriously consider using Scylla's driver for running against Cassandra, so the answer to this question is not important.
Please see: https://github.com/scylladb/python-driver/blob/master/docs/upgrading.rst I'll paste the relevant fragment below:
Since 3.21.0, scylla-driver fully supports DataStax products. dse-driver and
dse-graph users should now migrate to scylla-driver to benefit from latest bug fixes
and new features. The upgrade to this new unified driver version is straightforward
with no major API changes.
Changing the default version of the protocol would break backwards compatibility (because our docs say what version is the default, as @nyh noticed) and it would also break above promise.
There's a separate argument why we maintain our own driver with random improvements (and not just Scylla-specific features), and whether Scylla users must use the Scylla verision of the driver for every language (do we even have a Scylla version of every language?). I guess this issue is not the place to hold this discussion.
We don't - most notably, we don't have C# and Node.js drivers.
@Lorak-mmk
the docs mostly copy-paste from upstream with search and replace for the driver name, we did removed lots of parts, this part can be removed as well.
I don't think we need to support or test DSE versions, if we are compatible with cassandra it's probably enough for our users.
Also using CQL v4, should be working with cassandra and DSE,
starting with the version we support best, sounds o.k., especially since user would be able to select anything else they might need, if then need any of the advanced features of those versions.
If we are ok with not supporting DSE and dropping this promise from docs, then awesome - we can easily drop default protocol version to V5, and potentially remove a lot of DSE-specific code from the driver in the future. That would decrease failed attempts from 3 to 1 when connecting to Scylla without specifying protocol version. Dropping to V4 would still be a breaking change (as we want to be compatible with Cassandra), wouldn't it? Unless there are no new features there.
@tzach , @annastuchlik - please see https://github.com/scylladb/python-driver/issues/244#issuecomment-1649731361 - this is probably a bug in our documentation.
If we are ok with not supporting DSE and dropping this promise from docs, then awesome - we can easily drop default protocol version to V5, and potentially remove a lot of DSE-specific code from the driver in the future. That would decrease failed attempts from 3 to 1 when connecting to Scylla without specifying protocol version. Dropping to V4 would still be a breaking change (as we want to be compatible with Cassandra), wouldn't it? Unless there are no new features there.
The downside is that syncing with upstream changes are going to be more challenging - either you resolve conflicts, or you (re)apply patches after syncing (the latter is easier?) - but it becomes a process. I assume we have the former today, in some cases, anyway.
This issue affects control connection, not any of subsequent connections. So if your client connects to a cluster which has 8 nodes, 32 shards each, then the control connection will have 3 failures, as described, but the rest of shard connections (256) should connect at first try. Initially after reading description I thought that each connection is affected, so I'm noting it here in case anyone else misunderstands.
This issue affects control connection, not any of subsequent connections. So if your client connects to a cluster which has 8 nodes, 32 shards each, then the control connection will have 3 failures, as described, but the rest of shard connections (256) should connect at first try. Initially after reading description I thought that each connection is affected, so I'm noting it here in case anyone else misunderstands.
Isn't it per node given (the control connection)? Or once an initial connection is made, for the very 1st time, everything else works with the correct version, across nodes and shards? (I've tried with cqlsh against a single shard, single node docker instance...)
(The change in cqlsh is https://github.com/scylladb/scylla-cqlsh/pull/45 )
After reading the code I'm pretty sure it's once per Cluster
object - so only one connection, not one per node, and if you create multiple Session
objects from one Cluster
, only the first one will encounter this issue.
I'll verify this in Wireshark later.
After reading the code I'm pretty sure it's once per
Cluster
object - so only one connection, not one per node, and if you create multipleSession
objects from oneCluster
, only the first one will encounter this issue. I'll verify this in Wireshark later.
If that's the case, it is really benign and not worth fixing (as it'll introduce conflict with upstream on every rebase and I've offered a fix for cqlsh already)
Not worth handling, per the above discussion - closing.
Looking at a pcap, the ScyllaDB Python driver tries to connect with versions we do not support - 66 (DSE_V2), 65 (DSE_V1), 5. That means 3 connections that are responded with an ERROR, until we succeed to connect. Each connection is of course the whole 3 way TCP conn, then closed, etc... cc @fruch (I've seen it in cqlsh, now I understand it's due to the Python driver defaults)