Closed anna-geller closed 1 year ago
You can use this to start a local spark cluster to reproduce: docker compose up
version: "3"
services:
spark-iceberg:
image: tabulario/spark-iceberg
container_name: spark-iceberg
ports:
- 8888:8888
- 8082:8080 # to avoid port conflict with Kestra
- 10000:10000
- 10001:10001
Spark master is on port 7077 so you should declare this port in the docker-compose and use it in the task. Even with that, I didn't succeed in connecting to the master.
thanks for reproducing, let's focus on Athena then and we can keep this issue open for now 👍
You should both open the 7077 port and if using Docker in your task set the networkMode to host. Doing both will make it work.
The following flow works with the master port on the docker-compose exposed to 7077.
id: spark-submit
namespace: dev
tasks:
- id: spark
type: io.kestra.plugin.spark.PythonSubmit
runner: DOCKER
warningOnStdErr: false
dockerOptions:
image: tabulario/spark-iceberg
networkMode: host
user: root
entryPoint:
- /bin/sh
- -c
master: spark://localhost:7077
mainScript: |
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("HelloWorldApp").getOrCreate()
df = spark.createDataFrame([('Hello World',)], ['greeting'])
df.show()
spark.stop()
works indeed! <3 thx so much
Issue description
Simple reproducers
DOCKER runner
Error: