Closed lakshmanok closed 6 years ago
We may want to consider conda as the way to get all of our packages and dependencies (including Python), i.e. go beyond just having it be present in the image.
@nikhilk has this gotten any traction ? I want to install fastparquet but it fails with pip.
I would like to work on this pull request
@umang-sh Please feel free.
Note that this is likely a significant change/cleanup to our docker image starting with how python is setup. As we'll want to use python via miniconda.
Here is my example of a minimal (but old) miniconda setup - https://github.com/nikhilk/containers/blob/master/ipython/Dockerfile
@nikhilk @yebrahim @chmeyers I am taking this up and starting work on it , will comment in case I need any help/info :) This is gonna be a major change indeed
@nikhilk @chmeyers @yebrahim
Hi Guys,
Few questions arised while I was working on this :
1.All the python dependencies are in setup.py in pydatalab right?
2.With this we want to shift those dependencies to docker file via conda.So setup.py may or may not be used,since conda will take care.?
Also let me know what approach do you guys foresee other than this?
Thanks, Umang
Setup.py is about installing the library anywhere, so it should continue to have the required set of dependencies.
I believe we'll need to completely redo how the docker image is built to use conda instead of what is done right now.
Adding reference to Stack Overflow Q - https://stackoverflow.com/questions/47025059/install-conda-package-from-google-datalab
Thanks for the reference, I found this thread this way. I would also be interested to see this as a feature. Is someone still working on a PR for this?
@Holisticnature I am working on the PR for this :)
@nikhilk From what I understand of the current docker image flow. build.sh in different directories and prepare.sh and run.sh are the crucial files for the build and among these files , I only see use of pip in run.sh for installing pydatalab and no install of any other python dependency in the flow. Correct me if I am wrong. if we build the dockerfile with conda and add it properly with the current flow that would work right ?or is there any other script file we need to consider as well. Once this is clear I will start building the Dockerfile with the correct flow.
Thanks :)
Just released, https://research.google.com/colaboratory/unregistered.html
Not tested out yet but would anyone know if allows to install packages and if so: pip or conda ?
On Nov 8, 2017, at 12:46 PM, umang-sh notifications@github.com wrote:
@nikhilk From what I understand of the current docker image flow. build.sh in different directories and prepare.sh and run.sh are the crucial files for the build and among these files , I only see use of pip in run.sh for installing pydatalab . Correct me if I am wrong. if we build the dockerfile with conda and add it properly with the current flow that would work right ?or is there any other script file we need to consider as well. Once this is clear I will start building the Dockerfile with the correct flow.
Thanks :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Moving to conda entails re-doing the apt-get and pip install steps in the docker file (eg. https://github.com/googledatalab/datalab/blob/master/containers/base/Dockerfile). To get a sense of what I am alluding to, look at the definitions in https://github.com/nikhilk/containers where conda is used to install various packages.
@yiga2: Google Colaboratory supports package installation via pip, not conda.
Thanks Max. Well then, hope the effort of switching/adding conda to datalab could extend to Colaboratory...
AFAIAC I am a happy camper as fastparquet is now part of pandas 0.21 so 'pippable'
On Nov 8, 2017, at 7:31 PM, Max Ghenis notifications@github.com wrote:
@yiga2: Google Colaboratory allows package installation via pip, not conda.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I would appreciate Collaboratory support for conda as well. I am finding that pip is getting me 95% of what I need, but some of the support libraries in the UDST toolkit don't make it for example...
As some broader context, Microsoft Azure Notebooks run Anaconda and Jupyter "strongly recommend[s] installing Python and Jupyter using the Anaconda Distribution" on its install page.
Hi Guys , Sorry for the long Delay.
I have made a sample DockerFile with this change. Please have a look at it here.
https://github.com/umang-sh/containers
@nikhilk @yebrahim Thanks
Hello!
@umang-sh Thank you for working on this! I'm a new member of the Datalab team, and I've been taking a look at actually migrating away from pip to Conda for a little while now. I have a close to functional branch, but sorting out all of the dependency and environment issues for both Python 2 and 3 is looking like it's going to require some fairly significant changes to the Dockerfile as well as some minor changes to other scripts involved in the build process which I'm trying to fit within a single refactor commit. To that end, I'm going to reassign this issue to myself.
Closing this issue as conda support should be working as of #1923
Awesome!! Is there documentation on this yet? I haven't tried it, but would be interested in knowing how to install a package from conda, e.g. https://stackoverflow.com/q/47025059.
Checking in here, as the help page for Adding Python libraries to a Cloud Datalab instance doesn't mention conda.
This worked, though it was pretty slow:
!conda install -c ospc taxcalc --yes
--yes
is needed to bypass the prompt asking to install dependencies.
I'm late, but thanks for checking that @MaxGhenis .
Will there be documentation on using conda with datalab sometime? It would be good to know what the limitations are, such as whether (and how) we can run conda commands in a docker container shell instead of jupyter notebook.
edit: I found out how to access the docker shell using this link: Working With Notebooks. Certain commands still don't work, such as conda update conda
, which yields an http error.
A number of scientific packages are not installable by PyPI, but are instead installed using conda/minconda. It would be very helpful if conda were present in the Docker image by default.
See also: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/conda