Closed MrPowers closed 4 months ago
@MrPowers I'm kinda surprised we both didn't realize this.
What we needed to do when trying to run the examples with a local spark connect server was run the sbin/start-connect-server.sh
command from the repo directory, and not from the $SPARK_HOME
directory. So the full script statement should have been this.
$ $SPARK_HOME/sbin/start-connect-server.sh --packages "org.apache.spark:spark-connect_2.12:3.5.1,io.delta:delta-spark_2.12:3.0.0" \
--conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp" \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
The $SPARK_HOME
environment variable does need to be set for that script to work.
In the PR #44, I copied all the existing example datasets into a folder datasets/
and updated docker-compose.yml
to create a volume from that same directory.
With those two changes, the examples should work if you run them with a local spark connect server, or connecting to the docker container.
The examples currently use paths that are for the Docker workflow.
It would be cool if the examples could also work with non-Docker setups (e.g. when I manually spin up Spark Connect on localhost).
Perhaps we can check in all those data files into this repo, so these examples can work out of the box with Docker and Spark Connect localhost.