scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

Remove fork of spark-streaming-kinesis-asl #128

Closed julienrf closed 2 months ago

julienrf commented 2 months ago

We trade our fork with a smaller module that defines utility classes specific to our needs based on the existing spark-streaming-kinesis-asl classes.

Unlike what is described in this SO answer, it is not directly possible to use Spark Streaming with DynamoDB Streams. The existing KinesisReceiver implementation can not work as it is with DynamoDB streams. It needs the changes we previously applied in our fork, and which are described here and here.

Ideally, we could try to upstream our KinesisDynamoDBReceiver class to the spark project but it will anyway take time before it is merged and released.

The PR is split in several commits. The third one performs the actual changes, whereas the first two commits are essentially preparatory work. Notably, the second commit adds a verbatim copy of the classes from the original spark-streaming-kinesis-asl module, which can be used as a point of reference later if we want to compare our diff with the original implementation. I recommend looking separately at the first commit and then the third commit to review this PR.

Fixes #119.

tarzanek commented 2 months ago

@julienrf can you resolve the conflict please? I will merge after that, it looks good to me

julienrf commented 2 months ago

I resolved the conflict.

tarzanek commented 2 months ago

thnx, merging