replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
8.04k stars 559 forks source link

Switch from `python_packages` to `python_requirements` #157

Open bfirsh opened 3 years ago

bfirsh commented 3 years ago

The design is outlined in this comment:

Todo

Design process

We should decide between python_requirements and python_packages and make it work properly.

There is some background here that is not written down anywhere that needs writing down. tl;dr python_requirements doesn't work properly, python_packages is not ideal in various ways.

Potential designs

  1. Leave it how it is, remove python_requirements. The main downside of this is Python requirements are either in a non-standard location or duplicated.
  2. Switch to python_requirements, and use a simple parser to determine torch and tensorflow versions. The main downside of this is that it is not clear to the user that Cog is rewriting versions.
  3. Switch to python_requirements, but split out torch and tensorflow as separate top-level options to make it clear they do some special sauce behind the scenes. If torch/tensorflow is detected in python_requirements, then it ignores them, warns the user, or something else sensible.

User data

(Consider this a wiki and please edit! Edited by @bfirsh, ...)

zeke commented 2 years ago

The main downside of this is that it is not clear to the user that Cog is rewriting versions.

Is Cog NOT rewriting versions when you use python_packages?

bfirsh commented 2 years ago

It is. And come to think of it this is a downside of both (1) and (2) -- it's not obvious Cog is rewriting versions in both cases.

Torch/Tensorflow packages need to be compiled for a particular version of CUDA, and in Torch's case, those packages are not on PyPi but on their custom index.

For example, if a user specifies torch==1.10.0 in python_packages, then Cog detects the best CUDA version to use and rewrites the install to be something like torch==1.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html.

zeke commented 2 years ago

users normally already have some way of defining their dependencies (pip, Poetry, Conda, etc) and are unwilling to switch to a different method. [...] read dependencies from wherever the user has already defined them.

☝🏼 I think this makes sense.

zeke commented 2 years ago

Speaking of the myriad ways to define dependencies in Pythonland, I recently took the 2021 Python Developers survey and this was one of the questions:

Screen Shot 2021-11-08 at 6 31 38 PM

I checked out the results from the 2020 survey thinking it might help us know which formats to prioritize, but it looks like they didn't ask that question last year.

bfirsh commented 2 years ago

A bit of interesting data from #325: we have some undocumented options python_extra_index_urls and python_find_links, but looks like nobody is using these for any public models:

https://github.com/search?q=python_find_links+filename%3Acog.yaml&type=code https://github.com/search?q=python_extra_index_urls+filename%3Acog.yaml&type=code

zeke commented 2 years ago

Here are some rough stats from GitHub on the popularity of different dependency formats:

Filename Count
requirements.txt 2M
pipfile.lock 145K
pyproject.toml 136K
environment.yml 92K
poetry.lock 70K
constraints.txt 33K
environment.yaml 29K
zeke commented 2 years ago

Takeaways from an IRL conversation with @bfirsh today:

vccheng2001 commented 2 years ago

Hello. I was having a conversation with @andreasjansson just now about Conda support in Cog. This might be something to add future support for since many ML repos use conda/env.yml as the setup instead of pip, as Conda supports not only python packages but also C libraries, executables, etc.

Currently building C/C++ extensions in Cog (which many graphics/3D-related projects require) requires hacky workarounds (e.g. using subprocess calls, chaining commands under "run", etc), so it'd be cool to have a more robust solution for this.

okaris commented 1 year ago

@bfirsh @zeke Is there any new information about Conda support in Cog?

mattt commented 1 year ago

I agree with the plan of action @zeke articulated in https://github.com/replicate/cog/issues/157#issuecomment-1068606880, and I'm glad to see us making progress towards this with #714.

My top feedback from my first run of Cog has to do with a lack of requirements.txt file. Without it, PyLance and other editor functionality that I rely on didn't work. We should make python_requirements the default behavior in cog init going forward.

We can support other ecosystems like Conda by passing a different --package-manager flag to cog init (that flag would default to pip). For example, cog init --package-manager=conda would generate an environment.yml file.

bfirsh commented 1 year ago

Follow on from https://github.com/replicate/cog/pull/1016#issuecomment-1530003478...

I think the reason we didn't want to document it is because magic happens (waves hands), and it doesn't actually install the packages you expect in requirements.txt, which is unexpected. As outlined in the comments above, we want to tell the user clearly when magic happens so they know that package X is being replaced by package Y.

I don't think we need to remove the documentation for python_requirements though -- if anything this is a good motivation to get this fixed. 😄

mattt commented 1 year ago

I think the reason we didn't want to document it is because magic happens (waves hands), and it doesn't actually install the packages you expect in requirements.txt, which is unexpected. As outlined in the comments above, we want to tell the user clearly when magic happens so they know that package X is being replaced by package Y.

@bfirsh Ah, thanks for providing that context. I understand and appreciate why this was undocumented. Apologies if I unintentionally spoiled the trick. 😅

Just spitballing here, but what do you think about an approach like what pip-compile does? (I recently set this up for our Python client, actually: https://github.com/replicate/replicate-python/pull/84). You specify loose constraints are in pyproject.toml and then pip-compile uses those to generates a requirements.txt with exact requirements, acting as a lockfile. Rinse and repeat for dev requirements.

Maybe Cog could use packages defined in cog.yaml, resolve with magic, and then write out requirements.txt with the exact dependencies being installed? So still magic, but more "nothing up my sleeves" 🪄🎩

bfirsh commented 5 months ago

Looking back at this ticket, this is starting to feel like a clearer, more explicit design, particularly with the base image stuff that @andreasjansson and @tempusfrangit are working on...

  1. Switch to python_requirements, but split out torch and tensorflow as separate top-level options to make it clear they do some special sauce behind the scenes. If torch/tensorflow is detected in python_requirements, then it ignores them, warns the user, or something else sensible.