Open konobey opened 4 years ago
@konobey Did you figure out for it?
@Kiollpt I'm sorry for this error in the code. Kafka only accepts key-value datasets. The data must be converted into the right shape before writing to Kafka. This can be done by, for example, converting the records to JSON format.
This should do the trick:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val kvStream = sensorValues.select($"sensorId".cast(StringType) as "key", to_json(struct("*")) as "value")
Then, the kvStream can be written to Kafka:
val query = kvStream.writeStream.format("kafka")
.queryName("kafkaWriter")
.outputMode("append")
.option("kafka.bootstrap.servers", kafkaBootstrapServer) // comma-separated list of host:port
.option("topic", targetTopic)
.option("checkpointLocation", workDir+"/generator-checkpoint")
.option("failOnDataLoss", "false") // use this option when testing
.start()
@maasg Thank you for the method
About the action in Ch9
it would be like this
val KafkaSchema = Encoders.product[SensorData].schema
val iotData = rawData
.select(from_json($"value".cast("string"),KafkaSchema) as "record")
.select("record.*").as[SensorData]
@maasg Thanks! Could you fix the code in notebook, please?
Hello!
The code in notebook kafka-sensor-data-generator.snb.ipynb:
doesn't work because sensorValues is of type Dataset[SensorData], but there should be value attribute of type String concatenating all the attributes from Dataset[SensorData] row.