readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
7.92k stars 3.58k forks source link

Support conda for builds #857

Closed mrocklin closed 8 years ago

mrocklin commented 9 years ago

Read the Docs Conda Support

This will add the ability to generate documentation with conda environments on Read the Docs. This is mainly useful for libraries with large C dependencies, including many packages in the Scentific Python ecosystem.

Task List

You will be able to specify a conda environment.yml file, and Read the Docs will install these dependencies in your build environment.

Considerations

Read the Docs will keep seperate virtualenv & conda directories:

Users will be able to define a way to install packages for a project:

Read the Docs will need to change it's build code so that we don't hard-code virtualenv paths. We'll need to vary our environment creation, as well as bin path's for executables, based on the backend environment.

The other main thing is that we'll also need to install Sphinx & other build dependencies into the conda environment. We will continue to use pip for this, and it should be transparent, other than using the pip executable in the conda environment instead of the virtualenv.

It should also be noted that miniconda has a different install process from Python 2 and 3 -- also they recommend installing it from their bash scripts instead of pip. I hope that we will be able to use pip, as it will simplify our installation, and won't require an update to a bash script on version upgrades. We will have to see if we hit issues in testing.

Cleanup

Read the Docs will manage conda environment deletion on the removal of a project or version.

Documentation

We will need to add information about conda support to our documentation. We might want to add a topic guide around installing requirements, along with adding a specific reference for how to use & enable conda support.

Sponsorship

This work is being funded by Clinical Graphics -- many thanks for their support of Open Source.

ericholscher commented 9 years ago

Is there a reason you can't install conda-like packages with pip? I believe pip is the default installer for Python, and is likely the only thing we will support.

mrocklin commented 9 years ago

Pip tends not to work well with packages that have non-Python dependencies. This includes a lot of the Numeric/Scientific Python stack (e.g. Pandas.) This community is a decently sized chunk of the Python ecosystem, and would definitely appreciate better support from RTD (which, btw, I <3, thanks!)

In regards to default installers you may find it interesting that conda was recently blessed and brought in under the Python Packaging Authority (cc @ncoghlan).

ncoghlan commented 9 years ago

conda is actually still its own org (since it isn't Python specific), but we do recommend it when Python folks need a cross platform package manager that can handle the Python runtime and arbitrary external dependencies. By contrast, the PyPA toolset focuses specifically on Python packages (including C extensions), including playing nice with redistributors like conda and the Linux distros.

ericholscher commented 9 years ago

Interesting. So it would be a way to install binaries onto the build server? Does it work with virtualenvs, or is it installing things system wide?

mrocklin commented 9 years ago

Exactly.

It has its own system for virtual environments that relies on linking many packages into environments. Interestingly, because it packages binaries Python is itself just a package, so it's easy to do things like quickly spin up two environments in Python 2 and 3 for simultaneous testing. Many who use it (myself included) vastly prefer it to virtualenv, but that's subjective.

@asmeurer might have more information.

ncoghlan commented 9 years ago

pip/virtualenv & conda work at different levels.

The PyPA tools are designed to work within a larger system that provides the Python runtime and any external dependencies. That may be a Linux distro package manager, something like homebrew on Mac OS X, or just downloading and running a binary installer from python.org.

By contrast, conda is such a larger system - rather than being Python specific, it's a full "cross platform platform", designed to manage arbitrary binaries, including Python runtimes and external dependencies. This means it doesn't integrate with other environments the way pip can, but it also means it can be used to manage components where pip will fail completely.

ncoghlan commented 9 years ago

Oh, and to answer the "is conda system wide" question, no it isn't. It's designed to be run as an ordinary user, creating installation environments in their own directory space without needing root access.

shoyer commented 9 years ago

:+1: this would be a fantastic addition.

I figured out how to make the baseline scientific packages (numpy, scipy) + pip to work together to build everything I need for my docs, but it was a real guessing game to figure out which combinations of pip packages could safely install together.

I actually tagged and released v0.1 of my package before I realized that I would not be able to build it from scratch on RTD, because it only worked when I changed the requirements.txt file incrementally. This is obviously somewhat unfortunate and is actually pretty typical for the problems that arise when using pip to install scientific python libraries.

tritemio commented 9 years ago

I agree it would be a great addition. As a matter of fact I'm struggling with installing numpy to build the docs for my project. @shoyer how did you solved the problem?

shoyer commented 9 years ago

@tritemio The trick is to give the virtualenv in which you build your docs access to the global site-packages directory -- see Advanced Settings > Use system packages. RTD has numpy 1.8, scipy and matplotlib installed system wide. I setup my conf.py to print out the versions when building the docs: https://github.com/xray/xray/blob/v0.2/doc/conf.py

As for testing, to ensure that you can build your docs from scratch in a new virtualenv (each version of the docs gets its own virtualenv), try deleting the build environment: http://read-the-docs.readthedocs.org/en/latest/builds.html#deleting-a-stale-or-broken-build-environment

tritemio commented 9 years ago

@shoyer thanks! Your suggestions are narrowing down my problems, hope to fix them soon ...

faph commented 9 years ago

+1

I'd love to have conda support in RTD. In the same way that Travis CI does it. http://conda.pydata.org/docs/travis.html

tswicegood commented 9 years ago

FWIW, if RTD used buildpacks like Heroku and Cloud Foundry, there's a conda-buildpack that can detect a Conda environment.yml file and spin that up. That spec supports creating environments that have both Conda and pip packages in them. If nothing else, its conda_compile script might serve as a good reference is someone wants to take a stab at implementing this in RTD.

zgrnk commented 9 years ago

I am currently working on a project which uses numba and am trying to upload the project onto RTD. This cannot currently be done as it requires the llvm compiler. Is it possibilt for RTD either to install llvm and include it in the system packages under the Advanced Settings, or add support for conda?

ncoghlan commented 9 years ago

Using buildpacks or a container tech like Docker to do builds would require fundamentally redesigning the way ReadTheDocs works. On the other hand, that might not be a bad idea at some point, especially as Docker based public cloud services with a grants program for open source projects come online

Disclosure: I work for Red Hat, OpenShift Online has a grants program that includes open source projects in its scope, our next generation architecture is based on Docker & Kubernetes, and I personally believe that hosting a valuable service like RTFD would be a great way for us to support the community. So while porting to the current OpenShift architecture likely wouldn't make sense, porting to Docker/Kubernetes would open up both Google Container Engine and a future version of OpenShift Online as hosting options.

astrojuanlu commented 9 years ago

:+1: for the addition of conda, I am interested in hosting a package depending on numba too.

pitrou commented 9 years ago

While conda isn't supported, I've tried at least to disable the setup.py running and I'm having this weird error: https://github.com/rtfd/readthedocs.org/issues/1240

cdeil commented 9 years ago

:+1: I know a few packages that need a more recent scipy or matplotlib for their docs build and pip install on readthedocs fails ... conda would be a very nice solution!

chebee7i commented 9 years ago

Practically, won't this be necessary if you want to build older versions of your documentation that require older versions of NumPy, matplotlib, possibly with different APIs from whatever version of NumPy is installed system-wide?

shoyer commented 9 years ago

@chebee7i practically speaking, matplotlib and numpy have strong backwards compatibility guarantees, so I'm not too worried about API changes for them. Though matplotlib has been talking about a 2.0 release with a new default colormap...

chebee7i commented 9 years ago

@shoyer but functionality does change between versions and this can cause the documentation to be wrong, especially if you rely on buildtime-generated documentation through the sphinxext IPython directive. And I am thinking much more generally than matplotlib and numpy, but even extending to just pandas reveals very recent backwards compatibility changes.

jakirkham commented 8 years ago

+1 for conda support.

Carreau commented 8 years ago

+1, i'm currently trying to update old docs that use both Python, R and rpy2, I can almost trivially have everything working fine in a conda 2.7 env.

jakirkham commented 8 years ago

Am I completely mistaken or does readthedocs allow us to use docker ( http://read-the-docs.readthedocs.org/en/latest/development/buildenvironments.html#configuration ) ( http://read-the-docs.readthedocs.org/en/latest/api/doc_builder.html#readthedocs.doc_builder.environments.DockerEnvironment )? If so, there are pre-existing images for miniconda and miniconda3 ( https://hub.docker.com/u/continuumio/ ), which could be used here.

ericholscher commented 8 years ago

Went ahead and updated the description here to flush out the needed work. If folks have any thoughts feel free to comment here. I should be working to add conda support in the next few weeks.

ankostis commented 8 years ago

+1

Just make sure you provide conda through some proxy, to speedup download and avoid extra charges..

shoyer commented 8 years ago

I am so psyched that this is finally happening! Let me know if you need any help testing this out.

A few notes:

Korijn commented 8 years ago
  • [ ] Munge name property of environment.yml to proper RTD value

Turns out you can override the name in the command using the -n parameter. See the create command help here:

C:\Development\Projects>conda env create -h
usage: conda-env-script.py create [-h] [-f FILE] [-n NAME] [-q] [--force]
                                  [--json]
                                  [remote_definition]

Create an environment based on an environment file

Options:

positional arguments:
  remote_definition     remote environment definition / IPython notebook

optional arguments:
  -h, --help            Show this help message and exit.
  -f FILE, --file FILE  environment definition file (default: environment.yml)
  -n NAME, --name NAME  environment definition
  -q, --quiet
  --force               force creation of environment (removing a previously
                        existing environment of the same name.
  --json                Report all output as json. Suitable for using conda
                        programmatically.

examples:
    conda env create
    conda env create -n name
    conda env create vader/deathstar
    conda env create -f=/path/to/environment.yml
ericholscher commented 8 years ago

@Korijn Great -- thanks for the clarification. I will update the ticket.

ericholscher commented 8 years ago

I have a basic implementation working, and I'd love to test this against a repo that someone is using in the wild. @Korijn do you have a good repo that has an environment file checked into it?

faph commented 8 years ago

If you're still looking for test repos, feel freel to pull from https://github.com/OpenHydrology/floodestimation. Does the environment.yml need to have sphinx etc in it?

ericholscher commented 8 years ago

@faph Great, thanks. I don't think that it will need Sphinx specified. It will use the same order of operations that we support currently:

So the plan is to run conda env update inside of our environment, where we have already installed a base set of packages. This means you can specify a different version of Sphinx, etc. but we will have a default set that is used if they aren't specified.

faph commented 8 years ago

Sounds sensible, thanks.

ericholscher commented 8 years ago

@faph hmm, I'm getting a Error: Invalid package specification: appdirs 1.4* from conda now.

As far as I can tell, the format for the environment file isn't documented anywhere :/ I'm using conda 3.18.3, which I believe is up to date. Does that work locally for you?

shoyer commented 8 years ago

There's some documentation for environment.yml here: https://github.com/conda/conda-env#environment-file-example

It looks like @faph provided you with an invalid file -- it's missing some equals signs. This version works:

name: env

channels:
- https://conda.anaconda.org/openhydrology

dependencies:
- python
- appdirs=1.4*
- sqlalchemy=0.9*
- numpy=1.9*
- scipy>=0.16
- lmoments3>=1.0.2
faph commented 8 years ago

Sorry! Thanks. I always get this wrong between meta.yml, environment.yml and conda install x!

Will update in the repo. ... done.

ericholscher commented 8 years ago

Hrm, I've run into another issue, where the conda env update command doesn't accept the --prefix argument. It only accepts a name, which doesn't seem to let you override where that environment might be stored (it defaults to $HOME/miniconda/envs, but doesn't seem to allow overriding this path).

I'm looking into this more, but wonder if anyone has thoughts here. Ways to fix this:

The other option is to create the environment all at once, but that would then override all the packages that we install (Sphinx, etc) to versions that we specify, I believe.

Korijn commented 8 years ago

You could do an additional call to conda install afterwards to make sure the right versions of sphinx etc are installed?

I'll provide a sample repo soon, sorry for the delay. :)

shoyer commented 8 years ago

Suppose you have a conda environment installed at $CONDA_ENV_PATH. In my tests, $CONDA_ENVPATH/bin/conda env update has some weird behavior: it both updates the conda environment the command is run from AND installs a new environment at the name provided in environment.yml. This must be a bug...

faph commented 8 years ago

You should be able to set envs_dir in conda config. If you set that to your preferred path you should be able to create and update envs as per original plan. Just use --name instead of --prefix.

See http://conda-test.pydata.org/docs/config.html

ericholscher commented 8 years ago

Sounds like CONDA_ENVS_PATH is what I was looking for, for overriding where it looks for the named environment. This is a hacky solution, and I'd prefer to just use the --prefix, like in the conda env creation command, but that will at least give a path forward for now.

@shoyer Hmm, that definitely sounds like a bug.

I will hopefully have this at least to the point where I can post a Work in Progress PR in the near future.

faph commented 8 years ago

Before creating the environment, you can just do:

conda config --add envs_dirs path/to/envs

Then you can just conda env create/update --name rtd_env --file path/to/environment.yml.

jakirkham commented 8 years ago

I will hopefully have this at least to the point where I can post a Work in Progress PR in the near future.

Sounds exciting, @ericholscher.

So, there are many ways to install stuff with conda and I'm trying to get a grasp on how this is going to work. I see the environment.yml file has been discussed. There is also a bdist_conda subcommand for python setup.py to build a conda package. Finally, some cases provide a *.recipe or otherwise named directory with a recipe and scripts to build the package with conda-build. In the last two cases, the package must then be explicitly installed after building, but it will pull all runtime dependencies with it. If there are other cases, I have missed feel free to add.

How do you think these cases should be handled? Is there some way for us to specify to use how we would prefer it to be installed. In the simplest case, it could be a shell script, but if you have some better ideas I would be interested to hear.

faph commented 8 years ago

The most common approach seems to be to create first the environment with requirements (either using a full environment.yml file or requirements file) and then to install the package into that with python setup.py install. See for example http://conda.pydata.org/docs/travis.html

The alternative is first to build the package using the conda recipe (the folder that often has the word recipe in it). Then install that into a new environment using the --use-local option. I think this option is mostly used for people who want to test building the conda package (and optionally deploy it to say anaconda.org). I think this option is overkill for RTD. Also, the recipe sometimes lives in a different place/repo than the package's source code.

Imho RTD could just stick with the environment.yml file. It would be nice to be able to let the user specify the actual name of the file, for example if the requirements for building the docs are different than say testing, production.

ericholscher commented 8 years ago

This has now been deployed. We are supporting the environment.yml file for now.

You can see more in the docs here: http://docs.readthedocs.org/en/latest/conda.html -- It would be great to have folks test this out. I ran into some issues during development with python version mismatches and some of the other interesting parts of conda. Let me know if it isn't working for folks so we can make it work better.

faph commented 8 years ago

Excellent, many thanks!

I tried running it on one of my repos (develop branch) but I'm afraid it failed, see https://readthedocs.org/projects/floodestimation/builds/3601598/ .

Not sure If I need to do anything to trigger a conda build. I just supplied the readthedocs.yml (https://github.com/OpenHydrology/floodestimation/blob/develop/readthedocs.yml). I've still ticked the option to install the package itself with setup.py.

I'm asuming I should see some conda build steps in the output log if it's recognising the conda.file key.?

ivoflipse commented 8 years ago

@faph the problem is that you only specified the openhydrology channel (apart from the default channel). None of these channels seem to contain the sqlalchemy you requested.

I just ran conda search sqlalchemy and it shows that 0.9.* only supports Python versions up to 3.4, so if the python3 running on readthedocs is Python 3.5, it won't find the default sqlalchemy package.

This means you'll either have to pin your environment to python ==3.4 or build your own conda package for whatever version of sqlalchemy you intend to use and upload it to your openhydrology channel or lastly add the channel url to your channels list of someone who has the package you're after (look here to find an appropriate one)

ivoflipse commented 8 years ago

Scratch that, I misread where the error occurred. Given where it seems to call your code: "/home/docs/checkouts/readthedocs.org/user_builds/floodestimation/envs/develop/lib/python3.4/site-packages/floodestimation-0.7.1+3.g05e75eb-py3.4.egg/floodestimation/db.py" I think perhaps your environment is not activated ? Because your environment file specifies the name env and here it's running from develop.

Upon closer inspection, none of the steps mention creating your conda environment, so I wouldn't know how your dependencies could be available in the develop environment.

Also is the pip install task supposed to install these packages in the user's environment? Because then I hope that doesn't cause issues if any of the dependencies happen to clash. But I must admit that I don't fully understand how RTD builds the docs.

faph commented 8 years ago

Ok, I got a bit further this morning. Latest build (still failing) here: https://readthedocs.org/projects/floodestimation/builds/3602292/. At least it's now mentioning the conda install steps. So far it does:

  1. Create conda environment with just python, named after my git branch develop. Note that it installs Python 2!
  2. Conda install sphinx etc.
  3. Pip install mkdoc, readthedocs-sphinx-ext, recommonmark (I always get nervous when doing pip installs in a conda environment, probably should try to build conda packages for this)
  4. Conda env update, using my environment.yml file. (This throws some warnings from the system python in /usr/local/lib/python2.7 although it completes without errors).
  5. Run sphinx-build. Fails on missing sphinx package!

So it seems that some of the packages installed initially into the environment don't survive the various subsequent steps, possibly caused by a change from Python 2 to Python 3 and/or some pip installs halfway through.

Just a thought, would it be cleaner to let the user specify all dependencies, including sphinx? Since we've got the conda.file key in the RTD yaml config, it's no bother just to create a environment-rtd.yml file with the sphinx dependencies. This makes it explicit what's required and everything can get installed in one go with a clean conda env create from the environment file. We could create a rtd channel on anaconda.org containing all the sphinx dependencies including sphinx extensions to make is easier for people to specify the dependencies as conda packages without having to hunt for or build them.

faph commented 8 years ago

@ivoflipse RTD is taking care of environment naming and activating, that bit seems to work. The conda environments seems to be at /home/docs/checkouts/readthedocs.org/user_builds/{repo}/conda/{branch}. With {branch} being the environment name which gets subsequently used in conda's --name argument.

I don't know where the conda root environment lives, but that should not affect the build process. Conda gets called ok.