scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra/parquet files. Alt. from DynamoDB to Scylla Alternator.
https://migrator.docs.scylladb.com/stable/
Apache License 2.0
55 stars 34 forks source link

Migrate data from other different table #52

Open phenriqueabr opened 3 years ago

phenriqueabr commented 3 years ago

Guys,

I need to migrate all data from one table for other, but there are 2 differences:

1- there's one column more on destine (compensation_group text) 2- this new columns, is a Clustering Key (compensation_group) on destine table 3- I need value "default" on this new column on destine table.

origin_table.txt destine_table.txt

phenriqueabr commented 3 years ago

Error:

21/05/19 16:39:59 INFO migrator: Created a savepoint config at /data/migrator/savepoints/savepoint_1621453199.yaml due to schedule. Ranges added: Set() 21/05/19 16:40:01 ERROR migrator: Caught error while writing the DataFrame. Will create a savepoint before exiting java.lang.IllegalArgumentException: Some primary key columns are missing in RDD or have not been selected: compensation_group

tarzanek commented 3 years ago

hello @phenriqueabr there is a mapping that can be specified, but let's see if it can be used(have you tried it?), I will simulate your use case and let you know but most likely if the mapping won't work for you we will need to add some support for this use case to codebase

check https://github.com/scylladb/scylla-migrator/blob/master/config.yaml.example#L148

phenriqueabr commented 3 years ago

This rename function is when you have the same number of columns and just 1 column you gonna redirect you import, right?

In my case, the destine table have the same columns from origin, but with 1 more column! And that new column, was inserted on Clustering Key And, if it's possible... this new column needs "default" as value on all rows inserted in table destine (if not possible, could be null or empty.)

tarzanek commented 3 years ago

if the number of rows mismatches, it's OK, the column would be empty (since after all this is nosql and thin rows, so if you don't specify a value, it won't be used) BUT the problem here is that column is part of the key, so it has to be initialized

MV could be a workaround, but then I assume you need other values there than "default" later too

phenriqueabr commented 3 years ago

I understood what you said.

I know this new column is strange to have "default" as value, but we need to start with some value and on new rows will be different, and will have some updates to fix this all "default" values.

MV could be a alternative, but won't solve a future problem. They need to use this new columns on their queries

phenriqueabr commented 3 years ago

Any news?

tarzanek commented 3 years ago

So my idea is this: https://github.com/tarzanek/scylla-migrator/commit/49c6cbdde2868fc1921f192ec4dd21e2257d6f8b

(or as branch https://github.com/tarzanek/scylla-migrator/tree/defaults )

@phenriqueabr will you be able to test above?

tarzanek commented 2 years ago

I force pushed https://github.com/tarzanek/scylla-migrator/commits/defaults so the patch is now : https://github.com/tarzanek/scylla-migrator/commit/fea8d99c3dd58c462ba6695a52ebc96bef739724 now properly with schema adjustments and tested

so it should cover this use case