manifoldai / docker-cookiecutter-data-science

A fork of the cookiecutter-data-science leveraging Docker for local development.
http://drivendata.github.io/cookiecutter-data-science/
MIT License
130 stars 29 forks source link

Support for git and anaconda #2

Open rorcde opened 6 years ago

rorcde commented 6 years ago

Hi everyone,

I am exploring the docker image and the cookiecutter template and I noticed that the image does not come with git. This is inconvenient as many (bleeding-edge) libraries are not available on pip, only on their git repository. This means, one needs to manually install git via the terminal and then clone the git repository and install the library. This goes against the spirit of this project of automation and have everything self-contained. In addition, PyCharm does not seem to recognise libraries installed manually via pip and constantly ask to download them again.

Further, not everyone likes pip to manage their libraries. It would be great to have conda as an option too.

Looking forward to see further improvements!

davidrpugh commented 6 years ago

Perhaps consider a different base image. Rather than using a vanilla Ubuntu base image you could use the Jupyter team's data-science-notebook image. You would need to add a few additions (tensorflow being the most obvious).

rkoppula commented 5 years ago

Thanks for the feedback. Agree git should be available. We will add that to the next release. We found the precision with which one can track library dependencies using pipenv. But understand Anaconda's popularity. We will discuss this internally and see if/how we can make add it in.

davidrpugh commented 5 years ago

Perhaps it reflects my own bias towards conda but from my experience conda envs are as precise at tracking dependencies as pip envs. Have you come across an example of something that you track using pip that you have found can not be tracked using conda?

davidrpugh commented 5 years ago

I would also add the following blog post as strong evidence for why shifting to conda for package management would be a good idea. Significant performance boost using conda to install for CPU from using MKL linear algebra libraries.

image

Also using conda simplifies the installation of GPU version of tensorflow.