rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 307 forks source link

Provide a working Dockerfile #63

Closed peterroelants closed 3 years ago

peterroelants commented 6 years ago

Would it be possible to provide a working Dockerfile? There seems to be one, but it seems outdated compared to the bootstrap.sh install. A Dockerfile could make the environment replicatable on ec2 as well as local (no need for a Vagrant VM).

I tried installing the environment on ec2 with the provided scripts, but some things didn't install, and there were no logs created.

rjurney commented 6 years ago

I can do this, but I have to work out how to make the storage space work with Docker. Without external storage, docker runs out of space and dies while you run pyspark. I need external storage, and I don't know how to configure that. I will look into this.

In the meantime I recommend EC2. Sorry.

rjurney commented 5 years ago

I am thinking of doing this now that computers come with 32GB of RAM more often. I still need to solve the external storage volume setup problem.