hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
553 stars 280 forks source link

unable to pull the data from kafka topic using spark structured streaming #307

Closed harshini98 closed 5 years ago

harshini98 commented 5 years ago

from pyspark.sql import SparkSession from pyspark.conf import SparkConf from pyspark.sql.functions import from pyspark.sql.streaming import conf = SparkConf()

spark = SparkSession \ .builder \ .appName("Get Country Traffic") \ .master("yarn") \ .getOrCreate()

spark.sparkContext.setLogLevel("DEBUG")

spark.conf.set("spark.sql.shuffle.partitions", "2")

lines = spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", "wn01.itversity.com:6667, wn02.itversity.com:6667, wn03.itversity.com:6667") \ .option("subscribe", "retail-multis") \ .load() \ .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

lines \ .writeStream \ .queryName("country_traffic") \ .format("console") \ .trigger(processingTime='60 seconds') \ .start()