Kinesis Application State in Dynamo DB

Kinesis SQL library does not use KCL for most of its working. KCL and structured streaming APIs does not go along so well. Also, KCL comes with an extra cost of DynamoDB which can be avoided for structured streaming

But at the same time, while designing kinesis-sql, we knew that users would want more reliable shard progress committer. Hence the module provides an option of pluggable committer. We can do the following to use another committer

Implement https://github.com/qubole/kinesis-sql/blob/master/src/main/scala/org/apache/spark/sql/kinesis/MetadataCommitter.scala
Following code only consider HDFS as valid committer type. we can change it to use new DynamoDB based committer.

private def metadataCommitter: MetadataCommitter[ShardInfo] = {
    metaDataCommitterType.toLowerCase(Locale.ROOT) match {
      case "hdfs" =>
        new HDFSMetadataCommitter[ ShardInfo ](metaDataCommitterPath, hadoopConf(sqlContext))
      case _ => throw new IllegalArgumentException("only HDFS is supported")
    }
  } 

private def metaDataCommitterType: String = {
    sourceOptions.getOrElse("executor.metadata.committer", "hdfs").toString
  }

I am relying on the community to implement dynamodb committer. cc @VikramBPurohit

qubole / kinesis-sql

Kinesis Application State in Dynamo DB #22