Closed jingyanjingyan closed 5 years ago
@jingyanjingyan , you can't access ADLS from a WASB HDInsight cluster.
Set to ADLS cluster get the same error, send the mail to rui.
Synced offline with @konjac, we find sstream is not supported by HDInsight which means sstream related jars should be specified in the submission aruguments spark.jars
. @jingyanjingyan Can you retry to submit this job to HDInsight cluster with sstream lib specified in spark.jars
?
Submit to spark2-1adls cluster with parameter "Referenced Jars = adl://devtooltelemetryadls.azuredatalakestore.net/tmp/structurestreamforspark_2.11-1.0.10.jar" still got fails.
Retried with config below and succeed finally.
Cluster: Spark23-hdi4-yan(Spark version >= 2.3.1)
Referenced Jars: wasbs://spark23-hdi4-yan-2018-12-25t02-43-25-251z@zhwespk21.blob.core.windows.net/sstream/lib/structurestreamforspark_2.11-1.1.3.jar
Class file:
package sample
import org.apache.spark.sql.{SparkSession}
object ReadSStream {
def main(args: Array[String]) {
val spark = SparkSession.builder.appName("ReadSStreamDemo").getOrCreate()
val streamDf = spark.read.format("sstream").load("wasbs://spark23-hdi4-yan-2018-12-25t02-43-25-251z@zhwespk21.blob.core.windows.net/sstream/input/input.ss")
streamDf.createOrReplaceTempView("streamView")
val timestamp = System.currentTimeMillis.toString
spark.sql("SELECT COUNT(*) FROM streamView").rdd.saveAsTextFile(s"wasbs://spark23-hdi4-yan-2018-12-25t02-43-25-251z@zhwespk21.blob.core.windows.net/sstream/output/$timestamp")
}
}
Remember to use sstream
rather than sstreaminterop2
as read format since the latter format is designed for windows platform while our cluster is built on Linux platform.
Verify as fixed with the market build at 1/11/2019
Build: dev 860
Repro Steps: