Demo combining Spark/external data/Kafka

stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform

https://stackable.tech

Other

51 stars 2 forks source link

Demo combining Spark/external data/Kafka #103

Closed adwk67 closed 1 year ago

adwk67 commented 2 years ago

As a SDP demonstrator I would like to be able to show Spark jobs interacting with external data and other operators such as Kafka, HBase etc. This could cover the following (by way of example):

construct public urls from job arguments
write a spark job (e.g. in python) to download this data, perform some basic parsing and writing the results to a source such as a kafka topic
verify data in kafka (druid ingestion job? kcat? provectus?)

razvan commented 1 year ago

Done. There two demos available:

spark-hdfs (anomaly detection)
spark-kafka (data lakehouse)