readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
7.93k stars 3.58k forks source link

Support conda for builds #857

Closed mrocklin closed 8 years ago

mrocklin commented 9 years ago

Read the Docs Conda Support

This will add the ability to generate documentation with conda environments on Read the Docs. This is mainly useful for libraries with large C dependencies, including many packages in the Scentific Python ecosystem.

Task List

You will be able to specify a conda environment.yml file, and Read the Docs will install these dependencies in your build environment.

Considerations

Read the Docs will keep seperate virtualenv & conda directories:

Users will be able to define a way to install packages for a project:

Read the Docs will need to change it's build code so that we don't hard-code virtualenv paths. We'll need to vary our environment creation, as well as bin path's for executables, based on the backend environment.

The other main thing is that we'll also need to install Sphinx & other build dependencies into the conda environment. We will continue to use pip for this, and it should be transparent, other than using the pip executable in the conda environment instead of the virtualenv.

It should also be noted that miniconda has a different install process from Python 2 and 3 -- also they recommend installing it from their bash scripts instead of pip. I hope that we will be able to use pip, as it will simplify our installation, and won't require an update to a bash script on version upgrades. We will have to see if we hit issues in testing.

Cleanup

Read the Docs will manage conda environment deletion on the removal of a project or version.

Documentation

We will need to add information about conda support to our documentation. We might want to add a topic guide around installing requirements, along with adding a specific reference for how to use & enable conda support.

Sponsorship

This work is being funded by Clinical Graphics -- many thanks for their support of Open Source.

ivoflipse commented 8 years ago

I think it would be a good idea to have an RTD conda organization, which hosts all the packages for all supported platforms. That way you know it should always work for everyone with minimal effort.

For debugging how everything is configured, you should try calling conda info -a which will show how everything is configured. Runing conda list develop will show you want is currently present in your environment. This would make it easier to debug whether everything is present or not.

But I think the culprit is that you forgot to call source activate develop before calling sphinx-build. I reckon that since it's in your environment's bin folder, you should be able to simply call: source activate develop sphinx-build -T -E -b readthedocs -d _build/doctrees-readthedocs -D language=en . _build/html and have it build the docs.

faph commented 8 years ago

Not sure who you mean with "you". I don't need to/cannot activate an environment, RTD should take care of that. You're probably right though that the command:

python /home/docs/checkouts/readthedocs.org/user_builds/floodestimation/conda/develop/bin/sphinx-build ...

calls the wrong Python. By the way, should they not drop python from the command altogether because sphinx-build is the executable itself? I'm not too familiar with Python entry points on Linux... If you call the entry point with the full path you probably don't need to activate the environment at all.

ericholscher commented 8 years ago

@faph Ah -- I hit this issue in dev. Since you're specifying the Python version in your environment but not the yml, we're installing "python=2" with conda -- then installing Sphinx with pip, and then re-installing python 3.4 from your environment.yml -- this means that python environment doesn't have the dependencies that we installed in it.

I'm not sure the best way to work around this, other than putting this in your YAML config:

conda:
   file: environment.yml
python:
   version: 3.4

Which will install the proper version of Python the first time, I believe. I also hit the issue with the project not being installed properly. This file got the build working:

conda:
   file: environment.yml
python:
   version: 3.4
   setup_py_install: true
faph commented 8 years ago

Perfect! Many thanks - this works. Very pleased to be able to drop my numpy and scipy mocking!

I like the RTD yaml config file by the way.

I'm not so sure about the pre-installed packages though, whether that's the cleanest. Would it be worthwhile adding an option to the yaml config file to allow a "bare" environment, purely from the specified environment.yml file?

ericholscher commented 8 years ago

Awesome -- We're excited about the YAML file as well :)

I think it makes sense to be able to run with a clean environment and just support the environment defined in the file. From what I can tell, you can't create a conda environment with an environment file with conda create, Perhaps we could do conda env create -n <version> -f environment.yml if you enable the bare option, instead of doing a conda create.

To be honest, I find all the CLI interfaces to conda a bit confusing :)

faph commented 8 years ago

The only annoying thing with conda is the difference between plain conda and the conda env sub-command. Apart from that, it's pretty robust.

conda env create -n <version> -f <config.conda.file> would be awesome.

jakirkham commented 8 years ago

Sorry if I'm not following (getting over a cold), can one not specify the python version in environment.yml?

Korijn commented 8 years ago

Sure you can...

name: myenv
dependencies:
  - python ==3.5
faph commented 8 years ago

@jakirkham Yes, but currently RTD pre-creates a conda environment with just python and sphinx etc. To make sure this pre-created environment has the correct version you MUST specify the python version (also) in readthedocs.yml.

jakirkham commented 8 years ago

@faph, wouldn't specifying the Python version in environment.yml make sure it is properly upgraded or is there some other magic going on behind the scenes?

faph commented 8 years ago

@jakirkham that's what I thought. But for some reason that's not working. Not sure if that's because of RTD first installing some sphinx dependencies with pip into the environment or whether the upgrade from Python 2 to 3 is not safe. Not sure. My wish list includes having the option to fully create the exact environment in one go from your own environment.yml without anything pre-installed by RTD.

shoyer commented 8 years ago

We just hooked this up for xray, and it worked pretty much without a hitch!

On my first attempt, it still did the build with pip instead of conda. But after wiping the environment on that branch and resetting my advanced settings, the next time I did a build it used conda. I'm not quite sure what was going on there...

faph commented 8 years ago

I had that a couple of times too. I did the same with wiping the RTD build. Does RTD cache the config or something?

mrocklin commented 8 years ago

I'm having trouble with a project that needs to know where a conda installed .so file lives. I have two conda packages, a compiled C++ library, libhdfs3.so, and a Python wrapper library, hdfs3. The Python wrapper library tries to find the location of the .so file by finding the main conda directory and then adding /lib. It finds the main conda directory by shelling out to conda info and then finding the text after default environment:

(py35)mrocklin@notebook:~$ conda info
...
  default environment : /home/mrocklin/Software/anaconda/envs/py35
...
path = os.path.join(conda_dir, 'lib', 'libhdfs3.so')

Oddly when I do this in the RTD environment the resulting directory path to be /usr/lib/libhdfs3.so which makes me very confused. Where is the conda directory here? What is the result of calling conda info on a RTD machine? Are we using the system Python or the conda Python during builds?

Links

astrojuanlu commented 8 years ago

Hi @mrocklin, I know very little about the RTD build process so I might be saying nonsense but I don't see the activate step in the failed build you linked. All the packages are getting installed in a conda environment called latest but, as far as I can see, it's never activated. This might be the cause of the failure, because in that case conda info | grep 'default environment' will return the root environment. Perhaps calling /home/docs/checkouts/readthedocs.org/user_builds/hdfs3/conda/latest/bin/python would be safer in this case? My two cents.

astrojuanlu commented 8 years ago

By the way, instead of reading conda info perhaps you could use the sys.prefix variable, which should point to the right locations when activating the environment. ctypes.find_library unfortunately won't work, see https://github.com/ioos/conda-recipes/issues/184#issuecomment-96245725 and https://github.com/ocefpaf/conda-recipes/blob/8f8c28e79a79a06ebfb98b4a3c099e92965cd595/rtree/find_libray.patch#L51

Korijn commented 8 years ago

I think the conda environment's python.exe is called directly, which is indeed slightly incorrect since any other steps involved in the activate script are skipped (which can vary per environment).

faph commented 8 years ago

For reference, the conda environment is named after the branch or document version and could be anything like /home/docs/checkouts/readthedocs.org/user_builds/{reponame}/conda/{latest, develop, stable, ...}

jakirkham commented 8 years ago

Yeah, if an environment isn't activated, it is probably using the root python. Though, and I could be mistaken, there is no reason not to just installing everything into the root environment here.

Korijn commented 8 years ago

I think conda env create is incapable of that, actually.

jakirkham commented 8 years ago

I believe you are correct @Korijn. However, I think you could use conda env update, which shouldn't have an issue.

faph commented 8 years ago

Though, and I could be mistaken, there is no reason not to just installing everything into the root environment here. (@jakirkham)

One reason is for example Python 2 versus Python 3, i.e. you can have a standard/hard-coded root environment (managed by RTD, e.g. Python 2 + just conda) and let the user install exactly the documentation build packages in the environment you like (e.g. Python 3, sphinx, ...). So the root environment never gets touched.

I believe the approach Continuum wants to take is to isolate conda itself as much as possible into an environment that ordinary users do not touch, i.e. consider it a standalone application. This way you can guarantee that conda works without interfering/being interfered by user packages.

mrocklin commented 8 years ago

I believe the approach Continuum wants to take is to isolate conda itself as much as possible into an environment that ordinary users do not touch, i.e. consider it a standalone application.

That was the approach I was assuming was happening here. It tends to be the assumed default among conda users. The way that I would expect this to work is that the user supplies a conda environment for their build, this gets activated, and then RTD pushes the packages it needs on top of this environment.

faph commented 8 years ago

I think that is roughly happening at RTD, but only the other way around. RTD first creates a separate environment and install sphinx into it. Then it updates that environment with the user's supplied environment.yml.

Although there is no trace of conda's activate command in the build log, I assume the correct Python is on the PATH because docs builds do work (RTD calls python /full/path/to/env/bin/sphinx-build). Presumable conda's activate also puts the bin dir on the PATH; if that were to be used RTD could call sphinx-build directly?

ericholscher commented 8 years ago

Seems like there's a lot of back and forth here, but no real consensus on what a proper solution is. Can anyone outline what the best course of action is for properly supporting these use cases? What we have now seems to work in the normal case, but there are edge cases where it isn't working, is my reading?

jakirkham commented 8 years ago

Here is my real question. Is there a reason for installing into the root environment or is this just happening? If the former, what is the reason? If the latter, it is best to change the current behavior to create a clean new non-default environment as @mrocklin has said.

shoyer commented 8 years ago

I'm pretty sure RTD is not installing into the root conda environment. However, it doesn't activate a new environment either. Instead, it simply uses the binaries in the new build specific environment that it created directly. In my mind, this is a pretty reasonable solution -- activating an environment really only makes sense in an interactive session or shell script.

On Tue, Jan 12, 2016 at 11:26 AM, jakirkham notifications@github.com wrote:

Here is my real question. Is there a reason for installing into the root environment or is this just happening? If the former, what is the reason? If the latter, it is best to change the current behavior to create a clean new non-default environment as @mrocklin has said.

Reply to this email directly or view it on GitHub: https://github.com/rtfd/readthedocs.org/issues/857#issuecomment-171024692

mrocklin commented 8 years ago

Hrm, I think of conda environments as being particularly useful in these cases, where you want a reliable software environment for build purposes. Most build services within Continuum happen within a conda environment for predictability's sake.

jakirkham commented 8 years ago

@ericholscher, is there code somewhere we can look at for this? I think it would answer a lot of questions. Sorry if I missed the link somewhere.

shoyer commented 8 years ago

As I said, it is being done in a fresh conda environment, the environment just isn't being activated -- the binaries in the environment are being called directly.

On Tue, Jan 12, 2016 at 11:34 AM, Matthew Rocklin notifications@github.com wrote:

Hrm, I think of conda environments as being particularly useful in these cases, where you want a reliable software environment for build purposes. Most build services within Continuum happen within a conda environment for predictability's sake.

Reply to this email directly or view it on GitHub: https://github.com/rtfd/readthedocs.org/issues/857#issuecomment-171027183

mrocklin commented 8 years ago

Are there particular issues you foresee occuring if you activate the environment?

In this case the issue is that my library actively depends on the conda state in order to locate a shared object file. Arguably this is a less-than-ideal way to find a shared library, but it's not the ugliest thing that people are going to try with rtd+conda.

mrocklin commented 8 years ago

FWIW, I've also worked around this within my own library (I now allow the library to import if it can't find the .so file) so my immediate use case is gone. Still though, I think that fewer future corner cases will occur among conda users if the environment actually gets activated. I think that this is common case (though I have less experience here than many.)

mrocklin commented 8 years ago

Edit: fewer future corner cases will occur among conda users if the environment actually gets activated.

jakirkham commented 8 years ago

Is this being run in a docker image? I can certainly propose strategies that would ensure the environment is activated. If there is code I can PR against, I can even provide a solution, but it is hard to do in the dark.

shoyer commented 8 years ago

@jakirkham here is the PR that added conda support to RTD: https://github.com/rtfd/readthedocs.org/pull/1849

jakirkham commented 8 years ago

Cool, thanks @shoyer.

Carreau commented 8 years ago

As I said, it is being done in a fresh conda environment, the environment just isn't being activated -- the binaries in the environment are being called directly.

We have the same issue in Jupyter, if you don't activate the environment, code that rely on command line utilities being on path will not work properly leading to weird behavior. So calling directly /full/path/to/python is problematic. And "shelling out need to use sys.executable" is not a satisfying answer as code might require non-python deps that conda can install like pandoc.

jakirkham commented 8 years ago

So, I think running the activate script is probably too much to hope for (maybe I'm wrong). However, I have found this normally covers it on Linux (let me know if I am missing something). If I run conda info or pretty much any other conda command this works the same as activate. It can easily be added to whatever language we want. These are all environment variables.

faph commented 8 years ago

@ericholscher I think all that is asked for is that the environment created gets added to PATH, including the standard sub-dirs like bin, lib etc. In conda-land there is an activate script for that. To satisfy the crowds you may want to document the exact install steps including env variables. Just because now you've enabled conda, people are going to install the most exotic packages including non-Python stuff!

jakirkham commented 8 years ago

Honestly, I have never needed to add lib or anything else other than bin.

$CONDA_DEFAULT_ENV and $CONDA_ENV_PATH seem to be important. The first shouldn't be surprising (need the environment name somehow). I am not sure why the second is used (maybe if you have multiple conda installs). The rest of activate seems more about nice things for a user to have like $PS1 and such that really don't seem important here, but I could be wrong.

astrojuanlu commented 8 years ago

I forked the project to experiment, and have several thoughts:

This is the code I was about to try by the way:

https://github.com/Juanlu001/hdfs3/commit/b004f0d53c76ee0aaf0521e9fcf0b9eb5bd71cd1

Answering @mrocklin question about activate, I've been exclusively using conda for more than a year and I don't foressee any particular issues arising. I think the best option is probably to just use it.

ericholscher commented 8 years ago

We aren't running conda in an interactive session, so I'm not sure how we could run activate. If there are environment variables that should be supported, we could set them, but we're running these commands independently, so the variables set in the activate script wouldn't apply.

The code is here: https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/doc_builder/python_environments.py#L163

jakirkham commented 8 years ago

We aren't running conda in an interactive session, so I'm not sure how we could "call" activate.

Yeah, I wouldn't be surprised if this is a problem. The activate script is a long-ish bash script that needs to be sourced. Doesn't seem like that would mesh well with what you are doing. Fortunately, it doesn't seem to matter.

If there are environment variables that should be supported, we could set them, but we're running these commands independently, so the variables set in the activate script wouldn't apply.

Right, so if we can just tack them onto whatever environment that is used when shelling out, I think this would be fine.

The code is here: https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/doc_builder/python_environments.py#L163

Thanks for the link. I have been looking at the code. I had a question that I put on the merged PR.

pitrou commented 8 years ago

For the record, here is a diff of environment variables when I run activate here:

+PYTHONNOUSERSITE=1
-PATH=/home/antoine/.local/bin:/usr/local/cuda/bin:/home/antoine/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+PATH=/home/antoine/35/bin:/home/antoine/.local/bin:/usr/local/cuda/bin:/home/antoine/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+CONDA_ENV_PATH=/home/antoine/35
+CONDA_DEFAULT_ENV=/home/antoine/35

As you see there's not much to it. The most important is probably the appending of the conda environment's bin directory to the PATH.

jakirkham commented 8 years ago

Interesting, I don't get this PYTHONNOUSERSITE. Seems like a useful add in general. Not sure if it matters here. Which version of conda are you on?

pitrou commented 8 years ago

Oh, forget it. PYTHONNOUSERSITE is from my own activate wrapper :-)

jakirkham commented 8 years ago

Ah, ok. In any event, I think we can safely ignore it here as there should be only this one Python install that we are worried about.

ericholscher commented 8 years ago

Should be easy enough to support the conda PATH, as we already have the bin_path argument to our run call.

jakirkham commented 8 years ago

I could be wrong, but I think we want to be able to pass environment variables as a kwarg to something like this ( https://github.com/rtfd/readthedocs.org/blob/aba714e82d218d60773955aec62a3df74173348d/readthedocs/doc_builder/backends/sphinx.py#L156 ). Does run permit that?

jakirkham commented 8 years ago

So, we have to change the environment of a BuildCommand then, yes? Maybe we can just add these to the environment before the build is called?