scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

The migrator does not take advantage of the ScyllaDB driver #163

Open julienrf opened 1 week ago

julienrf commented 1 week ago

We communicate with ScyllaDB via the the spark-cassandra-connector, which uses the Apache Cassandra driver under the hood (version 4.13 at the time of writing).

This prevents us from taking advantage of the specific ScyllaDB driver such as shard awareness.

We should consider swaping the Cassandra driver with the ScyllaDB driver. We would probably have to change the spark-cassandra-connector itself, though.

tarzanek commented 1 week ago

so we'd first need to merge to our fork ( https://github.com/scylladb/spark-cassandra-connector/ ) and release spark connectors built on shard aware driver - e.g. https://github.com/tarzanek/spark-cassandra-connector/tree/v3.0.0-scylla Until 3.1 I checked and made sure we can have similar version of respective driver version in https://github.com/scylladb/java-driver/tags (above 3.0 is bad example, but we have the respective version now, back then it was missing) , R&D is also releasing this and afaik only spark 3.5.1 does an upgrade of driver (and even to version that should be part of released ones already)

that said that above simple patch for connector might not be everything, special extensions will need similar changes as https://github.com/scylladb/java-driver/pull/156 to be usable from rdds or leveraged in rdds (BYPASS CACHE being the most important one) for all above we'd need some QA, which I think is biggest blocker now technically the first step should be doable, so if we'd release it now, it would be without support