smohler / lernspark

Have you ever wondered what the f#*k Apache Spark, Docker, CI/CD, and modern big data architectures look are? Me too!
MIT License
0 stars 0 forks source link

Sandbox Notebook Pipeline Designer #12

Closed smohler closed 5 months ago

smohler commented 6 months ago

Notebook Pipeline Sandbox

Data analysis is all about first playing with some data. To emulate this we want to provide the user with a notebook, and some sample data that is generated.

Usage

Ideally the user should beable at anytime to run something like

$ lernspark-play

This should open up a Juypter notebook that is formatted with some cells. However lernspark-play should also generate a small subset of data. This data will be local on the machine and not from the cloud.

After the user has played around a bunch the ideas is we want them to save the pipeline.sql file from their session. It is this pipeline.sql file that the entire apachespark containter application will be based off of.

Requirements

  1. lernspark-play.sh
  2. sandbox.ipynb
  3. data/[main|sql|model].rs
smohler commented 5 months ago

Closed on #14