Getting started steps - Githubissues

Hi!

I was working with the examine_synthetic_data notebook to get started, and I wanted to clarify/point out a few things that could potentially be improved:

The guide does not mention it seems to work only for linux (or unix/osx) given the use of environment variables (not bad, but maybe good to clarify)
Perhaps also mention that you need to supply environment variables: $PROJECT_DIR, $PYTHON_PATH
The first line in build_dataset.py points to a default Python location, which is not what most people would be using if they use virtual environments. It could also be a mistake on my end.

Perhaps consider splitting up the Jupyter Notebook (and the read the docs) into a tutorial with multiple parts and/or making the file a little less verbose, which could help users get started. This might especially help interdisciplinary folks. I was a bit overwhelmed, and it made getting started daunting. I could also help you once I get more familiar with the pipeline.

The package looks great so far, and it is clear you have put a lot of thought into everything :).

mmcdermott / EventStreamGPT

Getting started steps #96