scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
61 stars 35 forks source link

How to pass arguments that are not mapped in the yaml conf file? #1

Open gnumoreno opened 5 years ago

gnumoreno commented 5 years ago

@iravid

For arguments to the spark connector like username and password (authentication)

Can we still do this?

spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://<spark-master-hostname>:7077 \
  --conf spark.cassandra.auth.username="cassandra"
  --conf spark.cassandra.auth.password="cassandra"
  --conf spark.scylla.config=<path to config.yaml>
  <path to scylla-migrator-assembly-0.0.1.jar>

Or does the scala migrator code have to be changed so these parameters can be passed via the yaml conf file?

iravid commented 5 years ago

Ah sorry, the Migrator would need to support this explicitly. I’ll add support for that in a day or two. On 21 Feb 2019, 4:28 +0200, Moreno Garcia e Silva notifications@github.com, wrote:

@iravid For arguments to the spark connector like username and password (authentication) Can we still do this? spark-submit --class com.scylladb.migrator.Migrator \ --master spark://:7077 \ --conf spark.cassandra.auth.username="cassandra" --conf spark.cassandra.auth.password="cassandra" --conf spark.scylla.config=

Or does the scala migrator code have to be changed so these parameters can be passed via the yaml conf file? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
tarzanek commented 4 years ago

same stuff, hit again by Moreno:

We are using the spark migrator but I would like to use keyspace and table as parameters instead of inside the config.yaml like this:

spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://sparkmaster:7077 \
  --conf spark.scylla.config=./config.yaml \
  --conf spark.scylla.source.keyspace="src" \
  --conf spark.scylla.source.table="scylladatatable" \
  --conf spark.scylla.dest.keyspace="dst" \
  --conf spark.scylla.dest.table="scylladatatable" \
  ./target/scala-2.11/scylla-migrator-assembly-0.0.1.jar

We get this error.

2020-02-03 20:49:53 INFO  StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Exception in thread "main" DecodingFailure(Attempt to decode value on failed cursor, List(DownField(keyspace), DownField(source)))

It works fine if we use it inside the config.yaml Maybe it is not spark.scylla.source.keyspace="src" anymore?

julienrf commented 3 months ago

To support this feature, one approach could be to transform the YAML blob just before we “parse” it into the data type MigratorConfig. The transformation would replace keys in the YAML document based on keys supplied in the Spark configuration. For instance, --conf spark.scylla.config.source.table=MyCustomeTableName would replace the key source.table in the provided YAML document. Ie, all the keys prefixed with spark.scylla.config. would override the content of the YAML document.