scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
55 stars 34 forks source link

ERROR migrator: Caught error while writing the DataFrame. Will create a savepoint before exiting java.lang.ArithmeticException: / by zero #50

Open tarzanek opened 3 years ago

tarzanek commented 3 years ago

I tried migrating from scylla compose.io container cloud to Scylla Cloud private (with private VPC) and it seems the 4.5 driver inside latest migrator has issues I saw:

21/04/23 17:47:21 WARN DefaultMetadata: [s0] Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Unsupported replication strategy: SimpleStrategy)
21/04/23 17:47:26 WARN ChannelPool: [s0|/ZZZ:17208]  Error while opening new channel (ConnectTimeoutException: connection timed out: ZZZ/ZZZ:17208)
21/04/23 17:47:26 WARN ChannelPool: [s0|/XXX:17208]  Error while opening new channel (ConnectTimeoutException: connection timed out: XXX/XXX:17208)
21/04/23 17:47:27 WARN DefaultMetadata: [s0] Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Unsupported replication strategy: SimpleStrategy)

and later

21/04/23 17:47:29 WARN DefaultMetadata: [s0] Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Unsupported replication strategy: SimpleStrategy)
21/04/23 17:47:29 ERROR migrator: Caught error while writing the DataFrame. Will create a savepoint before exiting
java.lang.ArithmeticException: / by zero
        at com.datastax.spark.connector.rdd.partitioner.CassandraPartitionGenerator.partitions(CassandraPartitionGenerator.scala:82)
        at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:277)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.sql.execution.SQLExecutionRDD.getPartitions(SQLExecutionRDD.scala:44)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:37)
        at com.scylladb.migrator.writers.Scylla$.writeDataframe(Scylla.scala:65)
        at com.scylladb.migrator.Migrator$.main(Migrator.scala:83)
        at com.scylladb.migrator.Migrator.main(Migrator.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

so connection / writing to Scylla Cloud has an issue with latest master

tarzanek commented 3 years ago

source was SimpleStrategy replication target was NetworkTopologyStrategy

I actually used master + ssl support https://github.com/scylladb/scylla-migrator/tree/ssl_config

FWIW the connection with same clusters works with https://github.com/tarzanek/scylla-migrator/tree/ssl-token-range-total so 71c37b32a2b8ea9c125a55a541f21c38aadf7ab6 was still working, while new version which uses 4.5 drivers and 2.5 connector doesn't