jazzband / pip-tools

A set of tools to keep your pinned Python dependencies fresh.
https://pip-tools.rtfd.io
BSD 3-Clause "New" or "Revised" License
7.69k stars 610 forks source link

Dependency handling in requirements when updating packages #10

Closed brutasse closed 10 years ago

brutasse commented 11 years ago

So, here is the promised brain dump, sorry for the length.

Right now naively updating requirements can lead to dependency conflicts. For instance, let's say I want to add raven to my project but pinned to a specific version:

$ pip install raven==1.9.4
…
Successfully installed raven simplejson

So raven needs simplejson. Now I run pip freeze and get in my requirements.txt:

raven==1.9.4
simplejson==2.4.0

Some time later I run pip-review and get (this is not what you'd get right now):

raven==2.0.2 is available (you have 1.9.4)
simplejson==2.6.2 is available (you have 2.4.0)

Note that the newer simplejson was already available when I initially installed raven, but raven needed simplejson>=2.3.0,<2.5.0. Raven 2.0.2 does as well, but this still encourages me to upgrade simplejson when I shouldn't.

The current version of raven dropped the >=2.3.0,<2.5.0 part so now we can get the latest and greatest raven and simplejson safely.

My point is that when updating dependencies, checking for conflicts is very hard to do by hand. This needs to be automated with a tool that yells at the developer when an update leads to a version conflict.

Ruby gets this right with Bundler. gem install bundle, create a Gemfile with the following content:

source :rubygems
gem 'compass-less-plugin'

And run bundle install. This installs the required package and its dependencies and creates a Gemfile.lock file:

GEM
  remote: http://rubygems.org/
  specs:
    chunky_png (1.2.6)
    compass (0.12.2)
      chunky_png (~> 1.2)
      fssm (>= 0.2.7)
      sass (~> 3.1)
    compass-less-plugin (1.0)
      compass (>= 0.10)
    fssm (0.2.9)
    sass (3.2.1)

PLATFORMS
  ruby

DEPENDENCIES
  compass-less-plugin

Gemfile.lock is like requirements.txt with pinned versions (not everything is pinned here but should probably be): when creating a new environment and running bundle install, bundler looks at the .lock file to install what's specified.

Then there is a bunch of commands that bundle provides. For instance, to list available updates (running this on a bundle created months ago):

$ bundle outdated
Fetching gem metadata from http://rubygems.org/.....

Outdated gems included in the bundle:
  * chunky_png (1.2.6 > 1.2.5)
  * fssm (0.2.9 > 0.2.8.1)
  * sass (3.2.1 > 3.1.12)
  * compass (0.12.2 > 0.11.7)

Updating compass-less-plugin and its dependencies can be done in one command (bundle update compass-less-plugin) and does so while checking for version conflicts.

Sorry if you're already familiar with all this. Now I'll try to explain how we can make improve requirements.txt by using this approach.

First, instead of putting all the requirements in requirements.txt, people would only list first-level deps, pinned. So for raven:

raven==1.9.4

Then some tool provided by pip-tools compiles this into the full requirements list, into an other file (like Gemfile and Gemfile.lock but with less noise):

raven==1.9.4
simplejson==2.4.0

The key point is that this tool builds the whole dependency tree for all the top-level requirements and dumps it as a safely-installable-with-no-conflicts requirements file, which pip can just use.

So next time raven is updated and doesn't require an old simplejson, the tool can update the simplejson requirement. When raven drops simplejson to use python's built-in json implementation, the 2nd-level requirement can be dropped as well, automatically.

Other use case: requests which used to have dependencies on oauthlib, certifi, chardet and doesn't anymore (and oauthlib needed rsa or pyasn1 or whatever). If I just need requests I'll list in my top-level requirements and the tool will pin or drop the dependencies if they're not needed when I upgrade requests itself.

And finally, this tool could prevent me from installing package X and Y which need Z<1.0 and Z>1.1.

That's the theory and I think pip already does some version conflict checks but that's not enough to guarantee safe updates. Now in practice, I think the dependency information is not provided by the PyPI API and requires the whole package to be fetched to actually extract it (or maybe create.io provides that info). So that's annoying but doable, and pip-tools seems like a nice place to experiment with such things.

I think buildout does check for dependency conflicts but I never managed to wrap my head around it.

What do you think? I'm happy to start a proof-of-concept that could be integrated in this project.

almet commented 11 years ago

I can see different interesting points here:

First, you're talking about what we do want to pin and what we don't want to. I think it really depends what's the use of this file will be. For instance, when deploying, I want to have all my dependencies pinned to a version that I know is working. This is what pip freeze > requirements.txt does.

The thing you're proposing with top-level dependencies sounds weird to me, because we already have this: that's what in the setup.py of your project, isn't it?

The second thing is dependency management and conflict detection. If the version specifiers are used in the setup.py, then it's perfectly possible to know if there is a conflict somewhere or not. Packaging / Distutils2 provide a tool to solve a dependency tree, see http://docs.python.org/dev/library/packaging.depgraph.html

brutasse commented 11 years ago

@ametaireau the context here is not necessarily a python library: this could be used on, say, a Django site which doesn't have a setup.py.

I fully agree that all deps should be pinnded when deploying, what I'm suggesting is a tool that translates top-level deps (=stuff you actually need, pinned as well) to the full list of requirements, the result of pip freeze. So you maintain only the top-level deps and the tool resolves the rest for you. Both the top-level requirements and the full list are maintained under version control.

The idea is that when you use or update raven, you don't want to worry about which version of simplejson you can use.

brutasse commented 11 years ago

I have started a branch at https://github.com/brutasse/pip-tools/compare/master...features;bundle

For now it just resolves top-level requirements into a full, pinned list. It doesn't install, it just generates a usable requirements.txt.

There is no check for dependency conflicts yet.

The name (pip-bundle) probably isn't a good idea given that pip already has a bundle command. The command also needs to be split into subcommands for the equivalent of bundle install, bundle outdated and bundle update (not sure we want this one, it's more or less the equivalent of bundle outdated, updating the top-level requirement and re-running bundle install. I'm not a big fan of a command that would alter the top requirements file).

nvie commented 11 years ago

Hi @brutasse, I think your ideas are great, and I'd love to work with you to get this nicely implemented. Before implementing, however, would it be a good idea to get our minds synced? I nice way would be to describe the API in a cram test, explicitly defining behaviour for these commands.

I've made a beginning to it. (I've dubbed the command pip-compile, as pip-bundle would be too confusing, indeed. Nevertheless, that's just a working title and we can change it easily.)

This Gist-based interface description can be changed quite easily and we can implement when we reach a stable state. Feel free to fork and modify the example I started to match your idea first—we'll work from there.

brutasse commented 11 years ago

@nvie awesome. At first I got confused by the cram tests since they failed on my machine: python3 is my default python. They still fail at the moment, pyflakes shows some issues with undefined names in some places.

I updated the gist. I like the pip-compile name much more than pip-bundle.

I like the use of a .in extension for top-level requirements, too :)

I don't think the tool should touch .in files but rather show what updates are available and then the user choses to update his .in requirements and run pip-compile again to generate the .txt files. pip-compile would sort of replace pip-dump, I guess.

Also I think it should be encouraged to pin stuff in requirements.in. Maybe even not support not-pinned requirements? Not sure about this one, in your first example it's probably not important to pin the nose dep in dev-requirements.in but since requirements.in controls what goes into your production environment it should only contain pinned packages.

nvie commented 11 years ago

This is great!

@nvie awesome. At first I got confused by the cram tests since they failed on my machine: python3 is my default python. They still fail at the moment, pyflakes shows some issues with undefined names in some places.

Yeah, I should fix these issues…

I updated the gist. I like the pip-compile name much more than pip-bundle. I like the use of a .in extension for top-level requirements, too :)

Cool, I think we can work with those, then.

I don't think the tool should touch .in files but rather show what updates are available and then the user choses to update his .in requirements and run pip-compile again to generate the .txt files. pip-compile would sort of replace pip-dump, I guess.

Yep, I agree with this. I think it's good to have a flag to upgrade automatically (in case there is no conflict). This flag will also help us write the test cases.

Also I think it should be encouraged to pin stuff in requirements.in. Maybe even not support not-pinned requirements? Not sure about this one, in your first example it's probably not important to pin the nose dep in dev-requirements.in but since requirements.in controls what goes into your production environment it should only contain pinned packages.

I think the default invocation should refuse them, indeed. But with an optional -f flag, you should be able to use the non-pinned version, too, I think.

Rethinking Things

Let's first introduce some new terminology, to better articulate the problem:

Term Meaning
Source Spec All of the *requirements.in files together
Compiled Spec All of the *requirements.txt, and .pipignore together
Recorded State Source Spec and Compiled Spec, as kept under version control
Environment The Python virtual environment, specifically the list of installed packages in there

The point of pip-tools is to keep the Environment and the Recorded State in sync all the time, while supporting checking for updates.

The current toolset pip-review and pip-dump are designed to work with the reality that the Environment and *requirements.txt files are leading—they are in essence managed manually. pip-review keeps the packages in the environment up-to-date, and pip-dump records the env state in version control.

However, our new approach flips this reality up-side-down and requires the tool to be in control all the time and should generate both the Environment and the Compiled Spec files. This is a pretty significant difference.

The result is a bit of a mess and responsibilities are a bit unclear, so let's say we ditch the current tools and start over with new, differently named tools, to avoid any confusion.

                              YE OLDE WAY

                       (review)               (dump)
          environment  ------->  environment  ----->  spec.txt

                              THE FUTURE™

                    (compile)            (sync)
           spec.in  -------->  spec.txt  ----->  environment

                          (compile --outdated)
           spec.{in,txt}  ------------------->  spec.{in,txt}

Let's say we assume the tools to be in charge and are responsible for periodically generating both the *.txt spec files and the actual virtual environment. As these tools bluntly "overwrite", this makes it a bad developer practice to manually pip install any packages, or to manually modify requirements.txt.

Effectively, it means that manually adding new lines to requirements.txt, or pip-installing new packages to the Environment will result in loss of them eventually, once someone runs pip-compile or pip-sync respectively.

We need a way to "sync" the Recorded State to the Environment. I'm thinking of a new command named pip-sync, which not only installs packages, but also uninstalls, in order to reflect the environment exactly as based on the specs. The net result would be identical to creating a new environment and running pip install -r <spec> for all Compiled Spec files. Example:

$ pip freeze
abc==1
bar==1
foo==1
$ cat dev-requirements.txt
bar==1
$ cat requirements.txt
foo==2
qux==1
$ pip-sync   # sync the environment (does not only install, does also _uninstall_)
$ pip freeze
bar==1  # unchanged
foo==2  # updated
qux==1  # newly installed
# uninstalled abc==1

This would also make for a great deployment command for services like Heroku (instead of pip install -r requirements.txt), to retire packages that aren't needed anymore.

I'll stop now, to make my thoughts digestable. Are we on the same page?

brutasse commented 11 years ago

Sure, we're on the same page!

One thing that strikes me as odd is the way commands are named, pip-<something>, I think a main command with subcommands feels more natural. It's what many command-line tools do, see pip, git, apt-get… Was there a particular reason to choose this naming scheme?

The advantage of this approach is that adding new subcommands is easier. compile --outdated doesn't do the same thing as compile so they could be split into 2 subcommands.

Example with a generic name (tool):

Generate .txt from .in:

tool compile [-r spec.in]

Show available packages:

tool outdated [-r spec.in]

Apply updates:

tool update|upgrade [-r spec.in] [package[==version specifier]]

Sync the environment:

tool sync [-r spec.txt] [-r other-spec.txt]

But then we need to replace tool with an appropriate name for "something that manages requirements and syncs environments". I like ruby's choice of "bundle", there also are a bunch of synonyms. Maybe this is premature bikeshedding though :)

+1 for uninstalling although it needs to be done carefully: tool sync installs everything from *-requirements.txt but tool sync -r dev-requirements.txt would uninstall stuff from requirements.txt, right? It may be a good idea to promote using tool sync -r requirements.txt [-r prod-requirements.txt] for production and tool sync for development, which grabs all the *-requirements.txt.

nvie commented 11 years ago

The main reason for naming the tools pip-something was a bit of wishful thinking where these commands could potentially fit nicely into pip itself as a subcommand. So you could read pip-compile as pip compile. The other reason for not picking a tool w/ subcommands is for simplicity. They're "just two scripts", no actual Python packages are installed with pip-tools currently.

I'm -0 on naming "tool" anything other than "pip-".

I'm with you on the necessity for an actual "outdated" command, although I don't particularly like it being a non-verb, but that's a personal itch. (I can rant away at Chef's knife kitchen command for hours :))

I concur with being careful when syncing, but worst case you'd have to reinvoke it with the correct params, so not too much harm done there. If this bumps into resistance too much, we could have it warn about/confirm uninstalls by default and allow -f to force it. Also, a --no-uninstall flag would be good thing to have, I think.

Lastly, I personally prefer not having to specify -r in front of every spec file in the command invocations. I know it's in line with pip itself, but I consider that design choice a bit unfortunate, as it breaks the possibility of using shell wildcards in the tool invocation, like:

$ tool sync specs/*.txt

More thoughts?

brutasse commented 11 years ago

Ok, thanks for the explanation about the commands "just being scripts". One issue with that is you can't have common utilities shared between commands (_check_output for instance). Are you opposed to pip-tools adding something to the python path for sharing between scripts?

I'm fine with not having the -r options.

The gist is up-to-date, I kept pip-outdated for now but this could be called pip-review instead.

I'll update my branch to add the cram tests, support .in files, add the pip-sync script. And maybe update pip-review to implement the pip-outdated behavior.

nvie commented 11 years ago

One issue with that is you can't have common utilities shared between commands (_check_output for instance). Are you opposed to pip-tools adding something to the python path for sharing between scripts?

I am not, and I think we will eventually end up there. I currently lived with the duplication, because I just wanted to avoid that (I wanted to keep the tools lean).

The gist is up-to-date, I kept pip-outdated for now but this could be called pip-review instead.

I was thinking the same.

I'll update my branch to add the cram tests, support .in files, add the pip-sync script. And maybe update pip-review to implement the pip-outdated behavior.

Cool. Please share your branch by opening a pull request so we can work on it together if you're ready.

nvie commented 11 years ago

OK, I've updated the gist once more (to be a bit more descriptive about what's going on and what's important in each step).

@brutasse, could you take a look at the pip-compile-specifics.txt file I've added? What do you think? Am I overcomplicating things here, or is this useful? I was thinking only pip-outdated should reach out to PyPI, or pip-compile at least should only do that if the currently pinned secondary versions don't match criteria (so it could reach out to PyPI to find versions that do match).

Or do you think pip-compile should always reach out to PyPI, find the latest versions that still match all criteria, and record those? It would definitely simplify things for us (as we don't have to consider requirements.txt in pip-compile at all), but it would lead to (secondary) package updates on every compile, and compiles would always be lengthy.

brutasse commented 11 years ago

On Tue, Oct 2, 2012 at 9:01 PM, Vincent Driessen notifications@github.com wrote:

OK, I've updated the gist once more (to be a bit more descriptive about what's going on and what's important in each step).

Very nice :)

@brutasse, could you take a look at the pip-compile-specifics.txt file I've added? What do you think? Am I overcomplicating things here, or is this useful? I was thinking only pip-outdated should reach out to PyPI, or pip-compile at least should only do that if the currently pinned secondary versions don't match criteria (so it could reach out to PyPI to find versions that do match).

I'd say people shouldn't touch requirements.txt manually while using pip-tools. For that use case I'd put the raven and simplejson requirements in the .in file, and the compiled file would incidentally have the same content. It'd be much simpler than trying to make pip-compile aware of stuff manually changed in compiled files…

Or do you think pip-compile should always reach out to PyPI, find the latest versions that still match all criteria, and record those? It would definitely simplify things for us (as we don't have to consider requirements.txt in pip-compile at all), but it would lead to (secondary) package updates on every compile, and compiles would always be lengthy.

By default I think compile should rebuild the whole thing. But there are ways to optimize, using a local cache for instance. for raven==1.9.3, pip-compile can look at the cached version of raven so it'd be fetched only the first time. For secondary requirements it's trickier to be lazy with PyPI. And of course for packages which are referenced but not hosted on PyPI (e.g. redis) there is no way to guess what the path to pip's cached version is without asking PyPI.

On the other hand pip-review / pip-compile aren't the kind of tasks you do all the time, and with my current implementation resolving sentry==5.0.13 to a full list of its 25 requirements takes 90 seconds. If sentry had pinned requirements itself, it'd be much faster :)

nvie commented 11 years ago

I'd say people shouldn't touch requirements.txt manually while using pip-tools. For that use case I'd put the raven and simplejson requirements in the .in file, [...]. It'd be much simpler than trying to make pip-compile aware of stuff manually changed in compiled files…

I agree that this would allow for a much simpler implementation of pip-compile. It would also, however, lead to a situation where pip-review is used to search for / upgrade any top-level dependency versions, whereas pip-compile is used to search for / upgrade secondary dependencies. Don't you think that's a bit weird from a UX perspective?

My thinking was: always try to compile specs with as much of the same pinned versions that you currently are using (where possible) and only (ask to) upgrade them when there's no other way. Or when an explicit pip-review takes place.

I'd also really like a fast and deterministic pip-compile. In essence, the behaviour that I would like is that when you run pip-compile immediately after another pip-compile, without changing the Source Specs in between, the second invocation should under no circumstances change anything to the Compiled Specs.

Any idea how Bundler's implementation relates to this?

nvie commented 11 years ago

OK, some bad news for me from the front. I think I'll have to let go of my pipe dream where we can figure out the dependency calculation without downloading any packages upfront. To get to the actual dependencies a package has, setup.py must be executed and dependencies are calculated runtime (depending on OS or Python version).

This is an ugly fact of life we have to deal with :(

On the other hand, this makes things clear: the only safe remaining way of calculating the actual dependency tree seems to be to "just install" the top-level packages:

I guess this is what @brutasse suggested in the first place, so sorry if I'm a bit late to the party of understanding the hairy parts of how this works under the hood :)

brutasse commented 11 years ago

Indeed, packages need to be downloaded. Mostly, I guess, because of the dynamic nature of setup.py and the fact people change install_requires depending on the python version or OS. The implementation I added yesterday doesn't create a temporary environment, it just runs setup.py egg_info to get the requirements:

https://github.com/nvie/pip-tools/blob/future/bin/pip-compile#L100-119

An issue with "just installing" the source spec in a temporary environment is that pip isn't aware of dependency conflicts. If I put in a requirements file:

sentry==5.0.14
Django==1.3

There is clearly a conflict here because sentry requires Django>=1.4.1,<=1.5. But pip install runs fine and Django 1.3 is installed, probably because the last requirement wins. This is especially annoying when you have a pinned version of Django on the top of your requirements file and something else in the tree requires 'Django': the pinned version won't be taken into account and you'll just get the latest version the first time you run pip install…

nvie commented 11 years ago

Awesome stuff. I've made some good progress in collecting and normalising specs. Will hopefully be push-ready by the end of the afternoon.

nvie commented 11 years ago

Hey @brutasse, please check out my work on the Spec normalization/conflict detection. I've kept your functions in there, although the main() function does not call all of them anymore. We need them later, but I wanted to get the bare normalization/conflict detection logic in first.

I've tested this with the following inputs:

$ cat requirements.in
raven==1.9.3
# begin whitespace

# end whitespace :)
sentry==5.0.13

$ cat dev-requirements.in
nose>=1.2.0,<1.4.0
# The following line is completely obsolete, because line 1 is much more narrow
nose>1.1.8,<=1.5.0
# The following line does narrow down the spec from line 1
nose>=1.1.8,<1.3.0
# Uncomment the following line if you want to render all previous nose specs obsolete
#nose==1.2.1
# Uncomment the following line too if you want to test a conflict :)
#nose==1.2.2
# The following foo specs will result in foo==1.5.0
foo>=1.5.0
foo<=1.5.0

I've also added some TODO notes for further implementation, so be my guest if you want to take some of them and improve. Hope you like this.

brutasse commented 11 years ago

@nvie awesome :) I'll look at integrating that with the package parsing code later tonight.

nvie commented 11 years ago

Hey @brutasse, I just quickly pushed out a commit that moved all data structures to a module, so we can share this among the other tools, whenever necessary. Feel free to move more supporting code to the module, to keep the scripts lightweight and easily readable.

nvie commented 11 years ago

Btw, if you want to connect in real time, I'm online on irc.freenode.net under #pip-tools.

brutasse commented 11 years ago

@nvie I have something ready for pip-compile here: https://gist.github.com/cfffc3537604cc311767. I haven't committed it because I'm not sure it's the best approach. What I currently do is:

2nd-level specs are also resolved to the latest versions available. Supposing we have package X that requires "simplejson<2.6" and package Y that requires "simplejson", the algorithm will add "simplejson==2.4.0" and *simplejson==2.6.2" to the SpecSet when resolving X and Y respectively, leading to a conflict.

This is sub-optimal but that particular problem shouldn't happen often and I can't think of an easy way to solve this… Let me know what you think.

And finally once the specset is resolved for all spec files, we need to dump things to the appropriate compiled files. This probably requires fixing SpecSet not to lose the source information (I saw your note) and adapt it to make it possible to get dependency information from the data structure directly.

nvie commented 11 years ago

@brutasse, I've plain committed your Gist to the project. I think the trick is to not add the pinned versions to the spec set as we're still building the set, but instead postpone that to the very last moment possible.

The trick is not to get lost in confusion here. There will be lots of hairy cases, even ones we can't come up with now. So I suggest to stop working on the algorithm right now and first come up with a test framework that makes it easy to express and test some assumptions / weird edge cases.

Instead of the piptools.{cache,pypi} modules, we actually need piptools.package_manager, that we can use to instantiate the backend that provides access to packages, and package info. Its interface should consist only of method calls supported by the PyPI backend, like:

class PackageManager(object):
    def find_best_match(self, spec):
        ...

    def get_dependencies(self, name, version):
        ...

Actual package contents don't matter that much to us, so the interface should hide PyPI / URL / cache path details.

Not only does this make the code more readable, it also allows us to stub out the whole backend with a fake one, for speeding up our test cases. We can use this to bypass the PyPI downloads/caching, speeding up our tests.

Imagine the following to express a fake dep tree. In essence, it's a DSL for creating a mini-PyPI on the fly and use that as a test stub:

{
  'foo-0.1': ['bar'],
  'bar-1.2': ['qux', 'simplejson'],
  'qux-0.1': ['simplejson<2.6'],
  'simplejson-2.4.3': [],
  'simplejson-2.6.0': [],
}

I foresee a test case, looking like this:

class MyTest(unittest.TestCase):
    def test_lookup(self):
        pkgmgr = StubbedPackageManager(contents_from_dict_above)
        name, version = pkgmgr.find_best_match('bar>=1.0')
        assert name == 'bar'
        assert version == '1.2'

        deps = pkgmgr.get_dependencies(name, version)
        assert 'qux' in deps
        assert 'simplejson' in deps

That's really all there is to it. I'll create this PackageManager structure now, so we can move the existing PyPI / package cache code in there first. Then, we can express our needs in test cases and go on with the actual problem.

nvie commented 11 years ago

Hey @brutasse, I've improved our test cases today, and also tinkered a bit with the idea for dependency resolving. Eventually, I've come up with the following body of code that could eventually be moved into a module. (I've kept it in the test case while still tinkering with it.)

Currently, this is a "while true" loop with an ugly break after 4 rounds, but the core of my findings is this.

It should:

It still does not solve all of our problems yet, but I think we can actually pull this off by adding some smart backtracking logic, but that'll be the next step.

To illustrate the above with your counterexample:

content = {
    'foo-0.1': ['bar'],
    'bar-1.2': ['qux', 'simplejson'],
    'qux-0.1': ['simplejson<2.6'],

    'simplejson-2.4.0': [],
    'simplejson-2.6.2': [],
}

The top-level dep here is "foo". Applying the logic described above, this yields the minimalized spec set for the total:

['foo', 'qux', 'bar', 'simplejson<2.6']

Which can be easily pinned down by resolving find_best_match() again on the result. Which is exactly what we want! Running the test will render the following output:

After round #1:
  - foo
After round #2:
  - foo
  - bar (from foo==0.1)
After round #3:
  - qux (from bar==1.2)
  - foo
  - bar (from foo==0.1)
  - bar (from foo==0.1)
  - simplejson (from bar==1.2)
After round #4:
  - qux (from bar==1.2)
  - qux (from bar==1.2)
  - foo
  - bar (from foo==0.1)
  - bar (from foo==0.1)
  - bar (from foo==0.1)
  - simplejson (from bar==1.2)
  - simplejson<2.6 (from qux==0.1)
  - simplejson (from bar==1.2)
After round #final:
  - qux (from inferred)
  - foo (from inferred)
  - bar (from inferred)
  - simplejson<2.6 (from inferred)

:sparkles::beer::sparkles:

_PS: The number of calls to find_best_match() will increase heavily by this algorithm, but fortunately this function can be memoized for performance reasons._

brutasse commented 11 years ago

@nvie nice! It's definitely a better approach than adding 2nd-level deps already pinned.

I couldn't spare some time tonight, and probably won't tomorrow either. Feel free to move stuff around, it shouldn't break anything on my side before sunday :)

nvie commented 11 years ago

I've done some work to detect duplicate Specs in a SpecSet: all Spec instances now have their qualifier list stored as a frozenset instead, making them immutable, hashable and comparable. I've also reduced the whole SpecSource hierarchy into simple "just-strings" sources. Together, this makes adding duplicate specs to SpecSet impossible, leading to more unified output:

After round #1:
  - foo
After round #2:
  - bar (from foo==0.1)
  - foo
After round #3:
  - bar (from foo==0.1)
  - foo
  - qux (from bar==1.2)
  - simplejson (from bar==1.2)
After round #4:
  - bar (from foo==0.1)
  - foo
  - qux (from bar==1.2)
  - simplejson (from bar==1.2)
  - simplejson<2.6 (from qux==0.1)
After round #final:
  - bar (from <inferred>)
  - foo (from <inferred>)
  - qux (from <inferred>)
  - simplejson<2.6 (from <inferred>)

Also, the SpecSet iterator interface now always returns the specs in a sorted fashion, making it easier to write test cases.

nvie commented 11 years ago

Hey @brutasse, I've added an example of a package dependency structure that our current approach can't handle, plus a few thoughts I had for resolving this programmatically. I'm interested in your thoughts, too.

brutasse commented 11 years ago

@nvie I just pushed a PackageManager that works with PyPI. I left a couple of TODOs in the code, there's a decision to make about how aggressive we want to be with package caching. Let me know what you think :)

brutasse commented 11 years ago

Hey @nvie, how's parenting? :)

I just completed the implementation of pip-compile and pushed a bunch of changes in the code and tests:

https://github.com/nvie/pip-tools/compare/a9f3b910e6...bb38a98148

I believe pip-compile works pretty great now. What's missing:

Do you have time to review my changes as I push them? I plan to keep working on the missing features in the next couple of days to finally reach a releasable state.

nvie commented 11 years ago

Hey @brutasse, parenting is great, but time-consuming :)

Finding time to work/review stuff for my open source projects is challenging to say the least currently, as we're figuring out a stable rhythm with our kids, work and sleep. Nevertheless, I'll try to review your patch. Feel free to work on more features—I could certainly use the help currently!

Thanksalot, Vincent

nvie commented 11 years ago

Thanks so much for this work, @brutasse, and sorry for not responding to this sooner! I love the work you've put into this. I like the logger and the raising of ConflictError over assertions. I've fixed a few of the remaining inconsistencies, and a few broken unit tests.

Running the cram test suite seems to have become pretty slow. Might this be due to a change you've made? Or just network stuff? It's still running here (~10 min now).

Thanks for your work!

brutasse commented 11 years ago

Um, not sure what's going on with the cram tests. They were running fine last time I checked. Pip-compile seems slower than usual, maybe there are issues with pypi…

brutasse commented 11 years ago

Hey @nvie, funny that you mention the project as I was playing with it again yesterday night. I just pushed a small fix, I think this is looking pretty good already. It's fast again (was probably a PyPI issue), here's what happened when I resolved sentry in a requirements.in:

amqp==1.0.11
anyjson==0.3.3
BeautifulSoup==3.2.1
billiard==2.7.3.27
celery==3.0.18
cssutils==0.9.10
Django==1.4.5
django==1.5.1
django-celery==3.0.17
django-crispy-forms==1.2.3
django-indexer==0.3.0
django-paging==0.2.4
django-picklefield==0.3.0
django-social-auth==0.7.22
django-social-auth-trello==1.0.3
django-static-compiler==0.3.1
django-templatetag-sugar==0.1
gunicorn==0.17.2
httpagentparser==1.2.2
httplib2==0.8
kombu==2.5.10
logan==0.5.5
nydus==0.10.5
oauth2==1.5.211
Pygments==1.6
pynliner==0.4.0
python-dateutil==1.5
python-openid==2.2.5
pytz==2013b
raven==3.3.3
redis==2.7.2
sentry==5.4.5
setproctitle==1.1.7
simplejson==3.1.3
six==1.3.0
South==0.7.6

Everything is correct, except the duplicate Django / django requirement. Did we decide something about PyPI's case insensitivity? Often private indices are case-sensitive so IMO packages themselves should take care of using the correct letter case.

I think what's left is compiling multiple .in files into multiple .txt files.

brutasse commented 11 years ago

Oh actually the handling of multiple .in files is done :)

nvie commented 11 years ago

Thanks for the patches! I'm all for searching case-insensitive. In an ideal world, packages take care of using the correct case, but that's not how packages work in reality. The sentry example above would fail and there would not be a way of using pip-tools in these real situations.

I don't think we should change the casing in the compiled output wherever possible, I think. In case of conflict, just pick one—the first occurrence maybe? It's arbitrary anyway.

Alternatively, we could add a -i flag to have this behaviour, but I'm generally not a big fan of adding flags like this.

Do you agree?

brutasse commented 11 years ago

I think in case of conflict we really need to pick the correct one: it's not completely arbitrary, there is a canonical casing. PyPI and its mirrors are case-insensitive but people who run private mirrors don't necessarily do it with the same software.

With PyPI it's transparent: https://pypi.python.org/simple/webob/ actually redirects to https://pypi.python.org/simple/WebOb/. With other software it might not be the case. I ran into issues in the past when package could simply not be installed from a custom index where simple/WebOb/ returns a 200 but simple/webob/ a 404…

I submitted a pull request to django-social-auth to fix the sentry issue.

nvie commented 11 years ago

That's even better. Is there a way we can already detect the canonical casing currently, or do we need to add another, explicit, HTTP request for this?

brutasse commented 11 years ago

Probably! Since we download the packages we should be able to infer it from the filename directly and even fix incorrect names.

nvie commented 11 years ago

Fixing would be even better—I like :)