n0k0m3 / pyspark-notebook-deltalake-docker

Jupyter Notebook Docker with Spark and DeltaLake support
9 stars 3 forks source link

PySpark Notebook with DeltaLake for production

This repo tries to replicate databricks runtime, plus feature-rich jupyter/docker-stacks.

Base image: rapidsai/rapidsai:22.02-cuda11.5-runtime-ubuntu20.04-py3.8

Additional packages:

Planning:

Starting Docker

Generate environment variables

Check .env.template for environment variables template, or modify and copy these lines

echo "JUPYTER_PATH=<path-to-notebook-directory>" > .env
echo "NB_UID=`id -u`" >> .env
echo "NB_GID=`id -g`" >> .env

Get path-to-notebook-directory using pwd in the notebook directory

Docker Compose

docker-compose up -d