Data analysis is all about first playing with some data.
To emulate this we want to provide the user with a notebook,
and some sample data that is generated.
Usage
Ideally the user should beable at anytime to run something like
$ lernspark-play
This should open up a Juypter notebook that is formatted with some cells.
However lernspark-play should also generate a small subset of data.
This data will be local on the machine and not from the cloud.
After the user has played around a bunch the ideas is we want them to
save the pipeline.sql file from their session. It is this pipeline.sql
file that the entire apachespark containter application will be based off of.
Notebook Pipeline Sandbox
Data analysis is all about first playing with some data. To emulate this we want to provide the user with a notebook, and some sample data that is generated.
Usage
Ideally the user should beable at anytime to run something like
This should open up a Juypter notebook that is formatted with some cells. However
lernspark-play
should also generate a small subset of data. This data will be local on the machine and not from the cloud.After the user has played around a bunch the ideas is we want them to save the pipeline.sql file from their session. It is this pipeline.sql file that the entire apachespark containter application will be based off of.
Requirements