As a SDP demonstrator I would like to be able to show Spark jobs interacting with external data and other operators such as Kafka, HBase etc. This could cover the following (by way of example):
construct public urls from job arguments
write a spark job (e.g. in python) to download this data, perform some basic parsing and writing the results to a source such as a kafka topic
verify data in kafka (druid ingestion job? kcat? provectus?)
As a SDP demonstrator I would like to be able to show Spark jobs interacting with external data and other operators such as Kafka, HBase etc. This could cover the following (by way of example):