Invalid keystore format exception on using truststore for cassandra ssl connection

Hi, We are planning to use the migrator for migration from cassandra to scylladb. Cassandra is deployed on dev environment in a 3 pod cluster and similarly scylla is also a 3 pod cluster deployed using Scylla Operator. Cassandra set up is a ssl connection and expects truststore details in the client connection requests. This trust store is stored in a kubernetes secret and mounted on the pods as volumes. This is being used successfully in all other clients who are connecting to cassandra from the same environment. Spark setup is also on the same k8s cluster deployed using helm chart. This worked perfectly fine for one of other usecases. The scylladb migrator code is built as per the documentation and the jar is copied on to the spark master pod. The truststore secret is also mounted on the spark master and is available on a local path to the pod.

Below is the config.yaml that we use :

# Example configuration for migrating from Cassandra:
source:
  type: cassandra
  host: cassandra.cassandra.svc.cluster.local
  port: 9042
  #optional, if not specified None will be used
  localDC: datacenter1
  credentials:
    username: cassandra
    password: test123
  # SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
  sslOptions:
    clientAuthEnabled: false
    enabled: true
  #  all below are optional! (generally just trustStorePassword and trustStorePath is needed)
    trustStorePassword: test123
    trustStorePath: /etc/config/tls/cassandra/client/truststore
#    trustStoreType: JKS
#    keyStorePassword: <keyStorePwd>
#    keyStorePath: <keyStorePath>
#    keyStoreType: JKS
    enabledAlgorithms:
     - TLS_RSA_WITH_AES_128_CBC_SHA
     - TLS_RSA_WITH_AES_256_CBC_SHA
  #  protocol: TLS
  keyspace: janusgraph
  table: system_properties
  # Consistency Level for the source connection
  # Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
  # Connector driver default is LOCAL_ONE. Our recommendation is LOCAL_QUORUM.
  # If using ONE or LOCAL_ONE, ensure the source system is fully repaired.
  consistencyLevel: LOCAL_QUORUM
  # Preserve TTLs and WRITETIMEs of cells in the source database. Note that this
  # option is *incompatible* when copying tables with collections (lists, maps, sets).
  preserveTimestamps: true
  # Number of splits to use - this should be at minimum the amount of cores
  # available in the Spark cluster, and optimally more; higher splits will lead
  # to more fine-grained resumes. Aim for 8 * (Spark cores).
  splitCount: 256
  # Number of connections to use to Cassandra when copying
  connections: 8
  # Number of rows to fetch in each read
  fetchSize: 1000
  # Optional condition to filter source table data that will be migrated
  # where: race_start_date = '2015-05-27' AND race_end_date = '2015-05-27'

# Example for loading from Parquet:
# source:
#   type: parquet
#   path: s3a://bucket-name/path/to/parquet-directory
#   # Optional AWS access/secret key for loading from S3.
#   # This section can be left out if running on EC2 instances that have instance profiles with the
#   # appropriate permissions. Assuming roles is not supported currently.
#   credentials:
#     accessKey:
#     secretKey:

# Example for loading from DynamoDB:
# source:
#   type: dynamodb
#   table: <table name>
#   # Optional - load from a custom endpoint:
#   endpoint:
#     # Specify the hostname without a protocol
#     host: <host>
#     port: <port>
#
#   # Optional - specify the region:
#   # region: <region>
#
#   # Optional - static credentials:
#   credentials:
#     accessKey: <user>
#     secretKey: <pass>
#
#   # below controls split factor
#   scanSegments: 1
#
#   # throttling settings, set based on your capacity (or wanted capacity)
#   readThroughput: 1
#
#   # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
#   # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
#   # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
#   # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
#   throughputReadPercent: 1.0
#
#   # how many tasks per executor?
#   maxMapTasks: 1
#
#   # When transferring DynamoDB sources to DynamoDB targets (such as other DynamoDB tables or Alternator tables),
#   # the migrator supports transferring live changes occuring on the source table after transferring an initial
#   # snapshot. This is done using DynamoDB streams and incurs additional charges due to the Kinesis streams created.
#   # Enable this flag to transfer live changes after transferring an initial snapshot. The migrator will continue
#   # replicating changes endlessly; it must be stopped manually.
#   #
#   # NOTE: For the migration to be performed losslessly, the initial snapshot transfer must complete within 24 hours.
#   # Otherwise, some captured changes may be lost due to the retention period of the table's stream.
#   #
#   # NOTE2: The migrator does not currently delete the created Dynamo stream. Delete it manually after ending the
#   # migrator run.
#   streamChanges: false

# Configuration for the database you're copying into
target:
  type: scylla
  host: simple-cluster-client.cassandra.cluster.svc.local 
  port: 9042
  #optional, if not specified None will be used
  localDC: datacenter1
  credentials:
    username: cassandra
    password: testcassandra123
  # SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
  #sslOptions:
  #  clientAuthEnabled: false
  #  enabled: false
  #  all below are optional! (generally just trustStorePassword and trustStorePath is needed)
  #  trustStorePassword: <pass>
  #  trustStorePath: <path>
  #  trustStoreType: JKS
  #  keyStorePassword: <pass>
  #  keyStorePath: <path>
  #  keyStoreType: JKS
  #  enabledAlgorithms:
  #   - TLS_RSA_WITH_AES_128_CBC_SHA
  #   - TLS_RSA_WITH_AES_256_CBC_SHA
  #  protocol: TLS
  # NOTE: The destination table must have the same schema as the source table.
  # If you'd like to rename columns, that's ok - see the renames parameter below.
  keyspace: janusgraph
  table: system_properties
  # Consistency Level for the target connection
  # Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
  # Connector driver default is LOCAL_QUORUM.
  consistencyLevel: LOCAL_QUORUM
  # Number of connections to use to Scylla when copying
  connections: 16
  # Spark pads decimals with zeros appropriate to their scale. This causes values
  # like '3.5' to be copied as '3.5000000000...' to the target. There's no good way
  # currently to preserve the original value, so this flag can strip trailing zeros
  # on decimal values before they are written.
  stripTrailingZerosForDecimals: false
  # if we cannot persist timestamps (so preserveTimestamps==false)
  # we can enforce in writer a single TTL or writetimestamp for ALL written records
  # such writetimestamp can be e.g. set to time BEFORE starting dual writes
  # and this will make your migration safe from overwriting dual write
  # even for collections
  # ALL rows written will get the same TTL or writetimestamp or both
  # (you can uncomment just one of them, or all or none)
  # TTL in seconds (sample 7776000 is 90 days)
  #writeTTLInS: 7776000
  # writetime in microseconds (sample 1640998861000 is Saturday, January 1, 2022 2:01:01 AM GMT+01:00 )
  #writeWritetimestampInuS: 1640998861000

# Example for loading into a DynamoDB target (for example, Scylla's Alternator):
# target:
#   type: dynamodb
#   table: <table name>
#   # Optional - write to a custom endpoint:
#   endpoint:
#     # If writing to Scylla Alternator, prefix the hostname with 'http://'.
#     host: <host>
#     port: <port>
#
#   # Optional - specify the region:
#   # region: <region>
#
#   # Optional - static credentials:
#   credentials:
#     accessKey: <user>
#     secretKey: <pass>
#
#   # Split factor for reading/writing. This is required for Scylla targets.
#   scanSegments: 1
#
#   # throttling settings, set based on your capacity (or wanted capacity)
#   readThroughput: 1
#
#   # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
#   # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
#   # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
#   # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
#   throughputReadPercent: 1.0
#
#   # how many tasks per executor?
#   maxMapTasks: 1

# Savepoints are configuration files (like this one), saved by the migrator as it
# runs. Their purpose is to skip token ranges that have already been copied. This
# configuration only applies when copying from Cassandra/Scylla.
savepoints:
  # Where should savepoint configurations be stored? This is a path on the host running
  # the Spark driver - usually the Spark master.
  path: /app/savepoints
  # Interval in which savepoints will be created
  intervalSeconds: 300

# Column renaming configuration. If you'd like to rename any columns, specify them like so:
# - from: source_column_name
#   to: dest_column_name
renames: []
# Which token ranges to skip. You shouldn't need to fill this in normally; the migrator will
# create a savepoint file with this filled.
skipTokenRanges: []

# Configuration section for running the validator. The validator is run manually (see README)
# and currently only supports comparing a Cassandra source to a Scylla target.
validation:
  # Should WRITETIMEs and TTLs be compared?
  compareTimestamps: true
  # What difference should we allow between TTLs?
  ttlToleranceMillis: 60000
  # What difference should we allow between WRITETIMEs?
  writetimeToleranceMillis: 1000
  # How many differences to fetch and print
  failuresToFetch: 100
  # What difference should we allow between floating point numbers?
  floatingPointTolerance: 0.001
  # What difference in ms should we allow between timestamps?
  timestampMsTolerance: 0

After copying the jar and the config.yaml file, we submit the spark job as per the documentation and then we get the below exception. ./spark-submit --class com.scylladb.migrator.Migrator --master spark://myspark-master-svc:7077 --conf spark.scylla.config=/tmp/config.yaml /tmp/scylla-migrator-assembly-0.0.1.jar

Exception


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/06/22 08:52:24 INFO SparkContext: Running Spark version 2.4.4
23/06/22 08:52:24 INFO SparkContext: Submitted application: scylla-migrator
23/06/22 08:52:24 INFO SecurityManager: Changing view acls to: spark
23/06/22 08:52:24 INFO SecurityManager: Changing modify acls to: spark
23/06/22 08:52:24 INFO SecurityManager: Changing view acls groups to:
23/06/22 08:52:24 INFO SecurityManager: Changing modify acls groups to:
23/06/22 08:52:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark); groups with view permissions: Set(); users  with modify permissions: Set(spark); groups with modify permissions: Set()
23/06/22 08:52:24 INFO Utils: Successfully started service 'sparkDriver' on port 35429.
23/06/22 08:52:24 INFO SparkEnv: Registering MapOutputTracker
23/06/22 08:52:24 INFO SparkEnv: Registering BlockManagerMaster
23/06/22 08:52:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/06/22 08:52:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/06/22 08:52:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7bb2f1d7-3dcc-45cf-8431-a264eb4f9843
23/06/22 08:52:24 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
23/06/22 08:52:24 INFO SparkEnv: Registering OutputCommitCoordinator
23/06/22 08:52:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/06/22 08:52:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://myspark-master-0.myspark-headless.cassandra.svc.cluster.local:4040
23/06/22 08:52:25 INFO SparkContext: Added JAR file:/tmp/scylla-migrator-assembly-0.0.1.jar at spark://myspark-master-0.myspark-headless.cassandra.svc.cluster.local:35429/jars/scylla-migrator-assembly-0.0.1.jar with timestamp 1687423945113
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://myspark-master-svc:7077...
23/06/22 08:52:25 INFO TransportClientFactory: Successfully created connection to myspark-master-svc/10.0.110.171:7077 after 32 ms (0 ms spent in bootstraps)
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230622085225-0000
23/06/22 08:52:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34721.
23/06/22 08:52:25 INFO NettyBlockTransferService: Server created on myspark-master-0.myspark-headless.cassandra.svc.cluster.local:34721
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230622085225-0000/0 on worker-20230622082658-10.12.0.86-33981 (10.12.0.86:33981) with 2 core(s)
23/06/22 08:52:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20230622085225-0000/0 on hostPort 10.12.0.86:33981 with 2 core(s), 1024.0 MB RAM
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230622085225-0000/1 on worker-20230622082739-10.12.0.51-39817 (10.12.0.51:39817) with 2 core(s)
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20230622085225-0000/1 on hostPort 10.12.0.51:39817 with 2 core(s), 1024.0 MB RAM
23/06/22 08:52:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManagerMasterEndpoint: Registering block manager myspark-master-0.myspark-headless.cassandra.svc.cluster.local:34721 with 366.3 MB RAM, BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230622085225-0000/0 is now RUNNING
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230622085225-0000/1 is now RUNNING
23/06/22 08:52:27 INFO migrator: Loaded config: MigratorConfig(Cassandra(cassandra.cassandra.svc.cluster.local,9042,Some(datacenter1),Some(Credentials(cassandra,cassa@2@2!)),Some(SSLOptions(false,true,Some(Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)),None,None,None,None,Some(cassa@2@2!),Some(/etc/config/tls/gremlin/client/truststore),None)),janusgraph,system_properties,Some(256),Some(8),1000,true,None,LOCAL_QUORUM),Scylla(simple-cluster-client.cassandra.cluster.svc.local,9042,Some(datacenter1),Some(Credentials(cassandra,cassandra)),None,janusgraph,system_properties,Some(16),false,None,None,LOCAL_QUORUM),List(),Savepoints(300,/app/savepoints),Set(),Validation(true,60000,1000,100,0.001,0))
23/06/22 08:52:27 INFO Cassandra: Using consistencyLevel [LOCAL_QUORUM] for SOURCE based on source config [LOCAL_QUORUM]
Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {cassandra.cassandra.svc.cluster.local:9042} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:181)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:169)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:169)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
        at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:89)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
        at com.scylladb.migrator.readers.Cassandra$.readDataframe(Cassandra.scala:231)
        at com.scylladb.migrator.Migrator$.main(Migrator.scala:47)
        at com.scylladb.migrator.Migrator.main(Migrator.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
        at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:236)
        at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:94)
        at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:409)
        at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:281)
        at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
        at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:764)
        at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:468)
        at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
        at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:818)
        at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:326)
        at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1000(DefaultSession.java:280)
        at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:126)
        at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
        at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
        at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
        at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:229)
        ... 18 more
Caused by: java.io.IOException: Invalid keystore format
        at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:663)
        at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
        at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)
        at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
        at java.security.KeyStore.load(KeyStore.java:1445)
        at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:126)
        at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
        ... 23 more```

We have followed all the steps as per the documentation. There is no keystore used or required to connect to cassandra.
But still we are get the above exception.
Can some please help or direct in the correct direction?

config.yaml:


# Example configuration for migrating from Cassandra:
source:
  type: cassandra
  host: cassandra.cassandra.svc.cluster.local
  port: 9042
  #optional, if not specified None will be used
  localDC: datacenter1
  credentials:
    username: cassandra
    password: test123
  # SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
  sslOptions:
    clientAuthEnabled: false
    enabled: true
  #  all below are optional! (generally just trustStorePassword and trustStorePath is needed)
    trustStorePassword: test123
    trustStorePath: /etc/config/tls/cassandra/client/truststore
#    trustStoreType: JKS
#    keyStorePassword: <keyStorePwd>
#    keyStorePath: <keyStorePath>
#    keyStoreType: JKS
    enabledAlgorithms:
     - TLS_RSA_WITH_AES_128_CBC_SHA
     - TLS_RSA_WITH_AES_256_CBC_SHA
  #  protocol: TLS
  keyspace: janusgraph
  table: system_properties
  # Consistency Level for the source connection
  # Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
  # Connector driver default is LOCAL_ONE. Our recommendation is LOCAL_QUORUM.
  # If using ONE or LOCAL_ONE, ensure the source system is fully repaired.
  consistencyLevel: LOCAL_QUORUM
  # Preserve TTLs and WRITETIMEs of cells in the source database. Note that this
  # option is *incompatible* when copying tables with collections (lists, maps, sets).
  preserveTimestamps: true
  # Number of splits to use - this should be at minimum the amount of cores
  # available in the Spark cluster, and optimally more; higher splits will lead
  # to more fine-grained resumes. Aim for 8 * (Spark cores).
  splitCount: 256
  # Number of connections to use to Cassandra when copying
  connections: 8
  # Number of rows to fetch in each read
  fetchSize: 1000
  # Optional condition to filter source table data that will be migrated
  # where: race_start_date = '2015-05-27' AND race_end_date = '2015-05-27'

# Example for loading from Parquet:
# source:
#   type: parquet
#   path: s3a://bucket-name/path/to/parquet-directory
#   # Optional AWS access/secret key for loading from S3.
#   # This section can be left out if running on EC2 instances that have instance profiles with the
#   # appropriate permissions. Assuming roles is not supported currently.
#   credentials:
#     accessKey:
#     secretKey:

# Example for loading from DynamoDB:
# source:
#   type: dynamodb
#   table: <table name>
#   # Optional - load from a custom endpoint:
#   endpoint:
#     # Specify the hostname without a protocol
#     host: <host>
#     port: <port>
#
#   # Optional - specify the region:
#   # region: <region>
#
#   # Optional - static credentials:
#   credentials:
#     accessKey: <user>
#     secretKey: <pass>
#
#   # below controls split factor
#   scanSegments: 1
#
#   # throttling settings, set based on your capacity (or wanted capacity)
#   readThroughput: 1
#
#   # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
#   # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
#   # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
#   # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
#   throughputReadPercent: 1.0
#
#   # how many tasks per executor?
#   maxMapTasks: 1
#
#   # When transferring DynamoDB sources to DynamoDB targets (such as other DynamoDB tables or Alternator tables),
#   # the migrator supports transferring live changes occuring on the source table after transferring an initial
#   # snapshot. This is done using DynamoDB streams and incurs additional charges due to the Kinesis streams created.
#   # Enable this flag to transfer live changes after transferring an initial snapshot. The migrator will continue
#   # replicating changes endlessly; it must be stopped manually.
#   #
#   # NOTE: For the migration to be performed losslessly, the initial snapshot transfer must complete within 24 hours.
#   # Otherwise, some captured changes may be lost due to the retention period of the table's stream.
#   #
#   # NOTE2: The migrator does not currently delete the created Dynamo stream. Delete it manually after ending the
#   # migrator run.
#   streamChanges: false

# Configuration for the database you're copying into
target:
  type: scylla
  host: simple-cluster-client.cassandra.cluster.svc.local 
  port: 9042
  #optional, if not specified None will be used
  localDC: datacenter1
  credentials:
    username: cassandra
    password: testcassandra123
  # SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
  #sslOptions:
  #  clientAuthEnabled: false
  #  enabled: false
  #  all below are optional! (generally just trustStorePassword and trustStorePath is needed)
  #  trustStorePassword: <pass>
  #  trustStorePath: <path>
  #  trustStoreType: JKS
  #  keyStorePassword: <pass>
  #  keyStorePath: <path>
  #  keyStoreType: JKS
  #  enabledAlgorithms:
  #   - TLS_RSA_WITH_AES_128_CBC_SHA
  #   - TLS_RSA_WITH_AES_256_CBC_SHA
  #  protocol: TLS
  # NOTE: The destination table must have the same schema as the source table.
  # If you'd like to rename columns, that's ok - see the renames parameter below.
  keyspace: janusgraph
  table: system_properties
  # Consistency Level for the target connection
  # Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
  # Connector driver default is LOCAL_QUORUM.
  consistencyLevel: LOCAL_QUORUM
  # Number of connections to use to Scylla when copying
  connections: 16
  # Spark pads decimals with zeros appropriate to their scale. This causes values
  # like '3.5' to be copied as '3.5000000000...' to the target. There's no good way
  # currently to preserve the original value, so this flag can strip trailing zeros
  # on decimal values before they are written.
  stripTrailingZerosForDecimals: false
  # if we cannot persist timestamps (so preserveTimestamps==false)
  # we can enforce in writer a single TTL or writetimestamp for ALL written records
  # such writetimestamp can be e.g. set to time BEFORE starting dual writes
  # and this will make your migration safe from overwriting dual write
  # even for collections
  # ALL rows written will get the same TTL or writetimestamp or both
  # (you can uncomment just one of them, or all or none)
  # TTL in seconds (sample 7776000 is 90 days)
  #writeTTLInS: 7776000
  # writetime in microseconds (sample 1640998861000 is Saturday, January 1, 2022 2:01:01 AM GMT+01:00 )
  #writeWritetimestampInuS: 1640998861000

# Example for loading into a DynamoDB target (for example, Scylla's Alternator):
# target:
#   type: dynamodb
#   table: <table name>
#   # Optional - write to a custom endpoint:
#   endpoint:
#     # If writing to Scylla Alternator, prefix the hostname with 'http://'.
#     host: <host>
#     port: <port>
#
#   # Optional - specify the region:
#   # region: <region>
#
#   # Optional - static credentials:
#   credentials:
#     accessKey: <user>
#     secretKey: <pass>
#
#   # Split factor for reading/writing. This is required for Scylla targets.
#   scanSegments: 1
#
#   # throttling settings, set based on your capacity (or wanted capacity)
#   readThroughput: 1
#
#   # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
#   # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
#   # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
#   # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
#   throughputReadPercent: 1.0
#
#   # how many tasks per executor?
#   maxMapTasks: 1

# Savepoints are configuration files (like this one), saved by the migrator as it
# runs. Their purpose is to skip token ranges that have already been copied. This
# configuration only applies when copying from Cassandra/Scylla.
savepoints:
  # Where should savepoint configurations be stored? This is a path on the host running
  # the Spark driver - usually the Spark master.
  path: /app/savepoints
  # Interval in which savepoints will be created
  intervalSeconds: 300

# Column renaming configuration. If you'd like to rename any columns, specify them like so:
# - from: source_column_name
#   to: dest_column_name
renames: []
# Which token ranges to skip. You shouldn't need to fill this in normally; the migrator will
# create a savepoint file with this filled.
skipTokenRanges: []

# Configuration section for running the validator. The validator is run manually (see README)
# and currently only supports comparing a Cassandra source to a Scylla target.
validation:
  # Should WRITETIMEs and TTLs be compared?
  compareTimestamps: true
  # What difference should we allow between TTLs?
  ttlToleranceMillis: 60000
  # What difference should we allow between WRITETIMEs?
  writetimeToleranceMillis: 1000
  # How many differences to fetch and print
  failuresToFetch: 100
  # What difference should we allow between floating point numbers?
  floatingPointTolerance: 0.001
  # What difference in ms should we allow between timestamps?
  timestampMsTolerance: 0

scylladb / scylla-migrator

Invalid keystore format exception on using truststore for cassandra ssl connection #97