Closed brutasse closed 10 years ago
I can see different interesting points here:
First, you're talking about what we do want to pin and what we don't want to. I think it really depends what's the use of this file will be. For instance, when deploying, I want to have all my dependencies pinned to a version that I know is working. This is what pip freeze > requirements.txt
does.
The thing you're proposing with top-level dependencies sounds weird to me, because we already have this: that's what in the setup.py of your project, isn't it?
The second thing is dependency management and conflict detection. If the version specifiers are used in the setup.py, then it's perfectly possible to know if there is a conflict somewhere or not. Packaging / Distutils2 provide a tool to solve a dependency tree, see http://docs.python.org/dev/library/packaging.depgraph.html
@ametaireau the context here is not necessarily a python library: this could be used on, say, a Django site which doesn't have a setup.py.
I fully agree that all deps should be pinnded when deploying, what I'm suggesting is a tool that translates top-level deps (=stuff you actually need, pinned as well) to the full list of requirements, the result of pip freeze
. So you maintain only the top-level deps and the tool resolves the rest for you. Both the top-level requirements and the full list are maintained under version control.
The idea is that when you use or update raven, you don't want to worry about which version of simplejson you can use.
I have started a branch at https://github.com/brutasse/pip-tools/compare/master...features;bundle
For now it just resolves top-level requirements into a full, pinned list. It doesn't install, it just generates a usable requirements.txt
.
There is no check for dependency conflicts yet.
The name (pip-bundle
) probably isn't a good idea given that pip already has a bundle command. The command also needs to be split into subcommands for the equivalent of bundle install
, bundle outdated
and bundle update
(not sure we want this one, it's more or less the equivalent of bundle outdated
, updating the top-level requirement and re-running bundle install
. I'm not a big fan of a command that would alter the top requirements file).
Hi @brutasse, I think your ideas are great, and I'd love to work with you to get this nicely implemented. Before implementing, however, would it be a good idea to get our minds synced? I nice way would be to describe the API in a cram test, explicitly defining behaviour for these commands.
I've made a beginning to it. (I've dubbed the command pip-compile
, as pip-bundle
would be too confusing, indeed. Nevertheless, that's just a working title and we can change it easily.)
This Gist-based interface description can be changed quite easily and we can implement when we reach a stable state. Feel free to fork and modify the example I started to match your idea first—we'll work from there.
@nvie awesome. At first I got confused by the cram tests since they failed on my machine: python3 is my default python. They still fail at the moment, pyflakes shows some issues with undefined names in some places.
I updated the gist. I like the pip-compile
name much more than pip-bundle
.
I like the use of a .in
extension for top-level requirements, too :)
I don't think the tool should touch .in
files but rather show what updates are available and then the user choses to update his .in
requirements and run pip-compile
again to generate the .txt
files. pip-compile
would sort of replace pip-dump
, I guess.
Also I think it should be encouraged to pin stuff in requirements.in
. Maybe even not support not-pinned requirements? Not sure about this one, in your first example it's probably not important to pin the nose
dep in dev-requirements.in
but since requirements.in
controls what goes into your production environment it should only contain pinned packages.
This is great!
@nvie awesome. At first I got confused by the cram tests since they failed on my machine: python3 is my default python. They still fail at the moment, pyflakes shows some issues with undefined names in some places.
Yeah, I should fix these issues…
I updated the gist. I like the pip-compile name much more than pip-bundle. I like the use of a .in extension for top-level requirements, too :)
Cool, I think we can work with those, then.
I don't think the tool should touch .in files but rather show what updates are available and then the user choses to update his .in requirements and run pip-compile again to generate the .txt files. pip-compile would sort of replace pip-dump, I guess.
Yep, I agree with this. I think it's good to have a flag to upgrade automatically (in case there is no conflict). This flag will also help us write the test cases.
Also I think it should be encouraged to pin stuff in requirements.in. Maybe even not support not-pinned requirements? Not sure about this one, in your first example it's probably not important to pin the nose dep in dev-requirements.in but since requirements.in controls what goes into your production environment it should only contain pinned packages.
I think the default invocation should refuse them, indeed. But with an optional -f flag, you should be able to use the non-pinned version, too, I think.
Let's first introduce some new terminology, to better articulate the problem:
Term | Meaning |
---|---|
Source Spec | All of the *requirements.in files together |
Compiled Spec | All of the *requirements.txt, and .pipignore together |
Recorded State | Source Spec and Compiled Spec, as kept under version control |
Environment | The Python virtual environment, specifically the list of installed packages in there |
The point of pip-tools is to keep the Environment and the Recorded State in sync all the time, while supporting checking for updates.
The current toolset pip-review
and pip-dump
are designed to work with the reality that the Environment and *requirements.txt files are leading—they are in essence managed manually. pip-review
keeps the packages in the environment up-to-date, and pip-dump
records the env state in version control.
However, our new approach flips this reality up-side-down and requires the tool to be in control all the time and should generate both the Environment and the Compiled Spec files. This is a pretty significant difference.
The result is a bit of a mess and responsibilities are a bit unclear, so let's say we ditch the current tools and start over with new, differently named tools, to avoid any confusion.
YE OLDE WAY
(review) (dump)
environment -------> environment -----> spec.txt
THE FUTURE™
(compile) (sync)
spec.in --------> spec.txt -----> environment
(compile --outdated)
spec.{in,txt} -------------------> spec.{in,txt}
Let's say we assume the tools to be in charge and are responsible for periodically generating both the *.txt spec files and the actual virtual environment. As these tools bluntly "overwrite", this makes it a bad developer practice to manually pip install
any packages, or to manually modify requirements.txt
.
Effectively, it means that manually adding new lines to requirements.txt
, or pip-installing new packages to the Environment will result in loss of them eventually, once someone runs pip-compile
or pip-sync
respectively.
We need a way to "sync" the Recorded State to the Environment. I'm thinking of a new command named pip-sync
, which not only installs packages, but also uninstalls, in order to reflect the environment exactly as based on the specs. The net result would be identical to creating a new environment and running pip install -r <spec>
for all Compiled Spec files. Example:
$ pip freeze
abc==1
bar==1
foo==1
$ cat dev-requirements.txt
bar==1
$ cat requirements.txt
foo==2
qux==1
$ pip-sync # sync the environment (does not only install, does also _uninstall_)
$ pip freeze
bar==1 # unchanged
foo==2 # updated
qux==1 # newly installed
# uninstalled abc==1
This would also make for a great deployment command for services like Heroku (instead of pip install -r requirements.txt
), to retire packages that aren't needed anymore.
I'll stop now, to make my thoughts digestable. Are we on the same page?
Sure, we're on the same page!
One thing that strikes me as odd is the way commands are named, pip-<something>
, I think a main command with subcommands feels more natural. It's what many command-line tools do, see pip, git, apt-get… Was there a particular reason to choose this naming scheme?
The advantage of this approach is that adding new subcommands is easier. compile --outdated
doesn't do the same thing as compile
so they could be split into 2 subcommands.
Example with a generic name (tool
):
Generate .txt from .in:
tool compile [-r spec.in]
Show available packages:
tool outdated [-r spec.in]
Apply updates:
tool update|upgrade [-r spec.in] [package[==version specifier]]
Sync the environment:
tool sync [-r spec.txt] [-r other-spec.txt]
But then we need to replace tool
with an appropriate name for "something that manages requirements and syncs environments". I like ruby's choice of "bundle", there also are a bunch of synonyms. Maybe this is premature bikeshedding though :)
+1 for uninstalling although it needs to be done carefully: tool sync
installs everything from *-requirements.txt
but tool sync -r dev-requirements.txt
would uninstall stuff from requirements.txt
, right? It may be a good idea to promote using tool sync -r requirements.txt [-r prod-requirements.txt]
for production and tool sync
for development, which grabs all the *-requirements.txt
.
The main reason for naming the tools pip-something
was a bit of wishful thinking where these commands could potentially fit nicely into pip
itself as a subcommand. So you could read pip-compile
as pip compile
. The other reason for not picking a tool w/ subcommands is for simplicity. They're "just two scripts", no actual Python packages are installed with pip-tools currently.
I'm -0 on naming "tool" anything other than "pip-".
I'm with you on the necessity for an actual "outdated" command, although I don't particularly like it being a non-verb, but that's a personal itch. (I can rant away at Chef's knife kitchen
command for hours :))
I concur with being careful when syncing, but worst case you'd have to reinvoke it with the correct params, so not too much harm done there. If this bumps into resistance too much, we could have it warn about/confirm uninstalls by default and allow -f
to force it. Also, a --no-uninstall
flag would be good thing to have, I think.
Lastly, I personally prefer not having to specify -r
in front of every spec file in the command invocations. I know it's in line with pip
itself, but I consider that design choice a bit unfortunate, as it breaks the possibility of using shell wildcards in the tool invocation, like:
$ tool sync specs/*.txt
More thoughts?
Ok, thanks for the explanation about the commands "just being scripts". One issue with that is you can't have common utilities shared between commands (_check_output
for instance). Are you opposed to pip-tools adding something to the python path for sharing between scripts?
I'm fine with not having the -r
options.
The gist is up-to-date, I kept pip-outdated
for now but this could be called pip-review
instead.
I'll update my branch to add the cram tests, support .in
files, add the pip-sync
script. And maybe update pip-review
to implement the pip-outdated
behavior.
One issue with that is you can't have common utilities shared between commands (
_check_output
for instance). Are you opposed to pip-tools adding something to the python path for sharing between scripts?
I am not, and I think we will eventually end up there. I currently lived with the duplication, because I just wanted to avoid that (I wanted to keep the tools lean).
The gist is up-to-date, I kept
pip-outdated
for now but this could be calledpip-review
instead.
I was thinking the same.
I'll update my branch to add the cram tests, support
.in
files, add thepip-sync
script. And maybe updatepip-review
to implement thepip-outdated
behavior.
Cool. Please share your branch by opening a pull request so we can work on it together if you're ready.
OK, I've updated the gist once more (to be a bit more descriptive about what's going on and what's important in each step).
@brutasse, could you take a look at the pip-compile-specifics.txt file I've added? What do you think? Am I overcomplicating things here, or is this useful? I was thinking only pip-outdated should reach out to PyPI, or pip-compile at least should only do that if the currently pinned secondary versions don't match criteria (so it could reach out to PyPI to find versions that do match).
Or do you think pip-compile should always reach out to PyPI, find the latest versions that still match all criteria, and record those? It would definitely simplify things for us (as we don't have to consider requirements.txt in pip-compile at all), but it would lead to (secondary) package updates on every compile, and compiles would always be lengthy.
On Tue, Oct 2, 2012 at 9:01 PM, Vincent Driessen notifications@github.com wrote:
OK, I've updated the gist once more (to be a bit more descriptive about what's going on and what's important in each step).
Very nice :)
@brutasse, could you take a look at the pip-compile-specifics.txt file I've added? What do you think? Am I overcomplicating things here, or is this useful? I was thinking only pip-outdated should reach out to PyPI, or pip-compile at least should only do that if the currently pinned secondary versions don't match criteria (so it could reach out to PyPI to find versions that do match).
I'd say people shouldn't touch requirements.txt manually while using
pip-tools. For that use case I'd put the raven and simplejson
requirements in the .in
file, and the compiled file would
incidentally have the same content. It'd be much simpler than trying
to make pip-compile aware of stuff manually changed in compiled files…
Or do you think pip-compile should always reach out to PyPI, find the latest versions that still match all criteria, and record those? It would definitely simplify things for us (as we don't have to consider requirements.txt in pip-compile at all), but it would lead to (secondary) package updates on every compile, and compiles would always be lengthy.
By default I think compile should rebuild the whole thing. But there
are ways to optimize, using a local cache for instance. for
raven==1.9.3
, pip-compile can look at the cached version of raven so
it'd be fetched only the first time. For secondary requirements it's
trickier to be lazy with PyPI. And of course for packages which are
referenced but not hosted on PyPI (e.g. redis) there is no way to
guess what the path to pip's cached version is without asking PyPI.
On the other hand pip-review / pip-compile aren't the kind of tasks
you do all the time, and with my current implementation resolving
sentry==5.0.13
to a full list of its 25 requirements takes 90
seconds. If sentry had pinned requirements itself, it'd be much faster
:)
I'd say people shouldn't touch requirements.txt manually while using pip-tools. For that use case I'd put the raven and simplejson requirements in the
.in
file, [...]. It'd be much simpler than trying to make pip-compile aware of stuff manually changed in compiled files…
I agree that this would allow for a much simpler implementation of pip-compile
. It would also, however, lead to a situation where pip-review
is used to search for / upgrade any top-level dependency versions, whereas pip-compile
is used to search for / upgrade secondary dependencies. Don't you think that's a bit weird from a UX perspective?
My thinking was: always try to compile specs with as much of the same pinned versions that you currently are using (where possible) and only (ask to) upgrade them when there's no other way. Or when an explicit pip-review
takes place.
I'd also really like a fast and deterministic pip-compile
. In essence, the behaviour that I would like is that when you run pip-compile
immediately after another pip-compile
, without changing the Source Specs in between, the second invocation should under no circumstances change anything to the Compiled Specs.
Any idea how Bundler's implementation relates to this?
OK, some bad news for me from the front. I think I'll have to let go of my pipe dream where we can figure out the dependency calculation without downloading any packages upfront. To get to the actual dependencies a package has, setup.py must be executed and dependencies are calculated runtime (depending on OS or Python version).
This is an ugly fact of life we have to deal with :(
On the other hand, this makes things clear: the only safe remaining way of calculating the actual dependency tree seems to be to "just install" the top-level packages:
pip-freeze -l
: that's our (flattened) resolved dependency treeI guess this is what @brutasse suggested in the first place, so sorry if I'm a bit late to the party of understanding the hairy parts of how this works under the hood :)
Indeed, packages need to be downloaded. Mostly, I guess, because of the dynamic nature of setup.py and the fact people change install_requires
depending on the python version or OS. The implementation I added yesterday doesn't create a temporary environment, it just runs setup.py egg_info
to get the requirements:
https://github.com/nvie/pip-tools/blob/future/bin/pip-compile#L100-119
An issue with "just installing" the source spec in a temporary environment is that pip isn't aware of dependency conflicts. If I put in a requirements file:
sentry==5.0.14
Django==1.3
There is clearly a conflict here because sentry requires Django>=1.4.1,<=1.5. But pip install
runs fine and Django 1.3 is installed, probably because the last requirement wins. This is especially annoying when you have a pinned version of Django on the top of your requirements file and something else in the tree requires 'Django': the pinned version won't be taken into account and you'll just get the latest version the first time you run pip install…
Awesome stuff. I've made some good progress in collecting and normalising specs. Will hopefully be push-ready by the end of the afternoon.
Hey @brutasse, please check out my work on the Spec normalization/conflict detection. I've kept your functions in there, although the main()
function does not call all of them anymore. We need them later, but I wanted to get the bare normalization/conflict detection logic in first.
I've tested this with the following inputs:
$ cat requirements.in
raven==1.9.3
# begin whitespace
# end whitespace :)
sentry==5.0.13
$ cat dev-requirements.in
nose>=1.2.0,<1.4.0
# The following line is completely obsolete, because line 1 is much more narrow
nose>1.1.8,<=1.5.0
# The following line does narrow down the spec from line 1
nose>=1.1.8,<1.3.0
# Uncomment the following line if you want to render all previous nose specs obsolete
#nose==1.2.1
# Uncomment the following line too if you want to test a conflict :)
#nose==1.2.2
# The following foo specs will result in foo==1.5.0
foo>=1.5.0
foo<=1.5.0
I've also added some TODO notes for further implementation, so be my guest if you want to take some of them and improve. Hope you like this.
@nvie awesome :) I'll look at integrating that with the package parsing code later tonight.
Hey @brutasse, I just quickly pushed out a commit that moved all data structures to a module, so we can share this among the other tools, whenever necessary. Feel free to move more supporting code to the module, to keep the scripts lightweight and easily readable.
Btw, if you want to connect in real time, I'm online on irc.freenode.net under #pip-tools
.
@nvie I have something ready for pip-compile here: https://gist.github.com/cfffc3537604cc311767. I haven't committed it because I'm not sure it's the best approach. What I currently do is:
2nd-level specs are also resolved to the latest versions available. Supposing we have package X that requires "simplejson<2.6" and package Y that requires "simplejson", the algorithm will add "simplejson==2.4.0" and *simplejson==2.6.2" to the SpecSet when resolving X and Y respectively, leading to a conflict.
This is sub-optimal but that particular problem shouldn't happen often and I can't think of an easy way to solve this… Let me know what you think.
And finally once the specset is resolved for all spec files, we need to dump things to the appropriate compiled files. This probably requires fixing SpecSet not to lose the source information (I saw your note) and adapt it to make it possible to get dependency information from the data structure directly.
@brutasse, I've plain committed your Gist to the project. I think the trick is to not add the pinned versions to the spec set as we're still building the set, but instead postpone that to the very last moment possible.
The trick is not to get lost in confusion here. There will be lots of hairy cases, even ones we can't come up with now. So I suggest to stop working on the algorithm right now and first come up with a test framework that makes it easy to express and test some assumptions / weird edge cases.
Instead of the piptools.{cache,pypi}
modules, we actually need piptools.package_manager
, that we can use to instantiate the backend that provides access to packages, and package info. Its interface should consist only of method calls supported by the PyPI backend, like:
class PackageManager(object):
def find_best_match(self, spec):
...
def get_dependencies(self, name, version):
...
Actual package contents don't matter that much to us, so the interface should hide PyPI / URL / cache path details.
Not only does this make the code more readable, it also allows us to stub out the whole backend with a fake one, for speeding up our test cases. We can use this to bypass the PyPI downloads/caching, speeding up our tests.
Imagine the following to express a fake dep tree. In essence, it's a DSL for creating a mini-PyPI on the fly and use that as a test stub:
{
'foo-0.1': ['bar'],
'bar-1.2': ['qux', 'simplejson'],
'qux-0.1': ['simplejson<2.6'],
'simplejson-2.4.3': [],
'simplejson-2.6.0': [],
}
I foresee a test case, looking like this:
class MyTest(unittest.TestCase):
def test_lookup(self):
pkgmgr = StubbedPackageManager(contents_from_dict_above)
name, version = pkgmgr.find_best_match('bar>=1.0')
assert name == 'bar'
assert version == '1.2'
deps = pkgmgr.get_dependencies(name, version)
assert 'qux' in deps
assert 'simplejson' in deps
That's really all there is to it. I'll create this PackageManager
structure now, so we can move the existing PyPI / package cache code in there first. Then, we can express our needs in test cases and go on with the actual problem.
Hey @brutasse, I've improved our test cases today, and also tinkered a bit with the idea for dependency resolving. Eventually, I've come up with the following body of code that could eventually be moved into a module. (I've kept it in the test case while still tinkering with it.)
Currently, this is a "while true" loop with an ugly break after 4 rounds, but the core of my findings is this.
It should:
It still does not solve all of our problems yet, but I think we can actually pull this off by adding some smart backtracking logic, but that'll be the next step.
To illustrate the above with your counterexample:
content = {
'foo-0.1': ['bar'],
'bar-1.2': ['qux', 'simplejson'],
'qux-0.1': ['simplejson<2.6'],
'simplejson-2.4.0': [],
'simplejson-2.6.2': [],
}
The top-level dep here is "foo". Applying the logic described above, this yields the minimalized spec set for the total:
['foo', 'qux', 'bar', 'simplejson<2.6']
Which can be easily pinned down by resolving find_best_match()
again on the result. Which is exactly what we want! Running the test will render the following output:
After round #1:
- foo
After round #2:
- foo
- bar (from foo==0.1)
After round #3:
- qux (from bar==1.2)
- foo
- bar (from foo==0.1)
- bar (from foo==0.1)
- simplejson (from bar==1.2)
After round #4:
- qux (from bar==1.2)
- qux (from bar==1.2)
- foo
- bar (from foo==0.1)
- bar (from foo==0.1)
- bar (from foo==0.1)
- simplejson (from bar==1.2)
- simplejson<2.6 (from qux==0.1)
- simplejson (from bar==1.2)
After round #final:
- qux (from inferred)
- foo (from inferred)
- bar (from inferred)
- simplejson<2.6 (from inferred)
:sparkles::beer::sparkles:
_PS: The number of calls to find_best_match()
will increase heavily by this algorithm, but fortunately this function can be memoized for performance reasons._
@nvie nice! It's definitely a better approach than adding 2nd-level deps already pinned.
I couldn't spare some time tonight, and probably won't tomorrow either. Feel free to move stuff around, it shouldn't break anything on my side before sunday :)
I've done some work to detect duplicate Specs in a SpecSet: all Spec instances now have their qualifier list stored as a frozenset instead, making them immutable, hashable and comparable. I've also reduced the whole SpecSource hierarchy into simple "just-strings" sources. Together, this makes adding duplicate specs to SpecSet impossible, leading to more unified output:
After round #1:
- foo
After round #2:
- bar (from foo==0.1)
- foo
After round #3:
- bar (from foo==0.1)
- foo
- qux (from bar==1.2)
- simplejson (from bar==1.2)
After round #4:
- bar (from foo==0.1)
- foo
- qux (from bar==1.2)
- simplejson (from bar==1.2)
- simplejson<2.6 (from qux==0.1)
After round #final:
- bar (from <inferred>)
- foo (from <inferred>)
- qux (from <inferred>)
- simplejson<2.6 (from <inferred>)
Also, the SpecSet iterator interface now always returns the specs in a sorted fashion, making it easier to write test cases.
Hey @brutasse, I've added an example of a package dependency structure that our current approach can't handle, plus a few thoughts I had for resolving this programmatically. I'm interested in your thoughts, too.
@nvie I just pushed a PackageManager that works with PyPI. I left a couple of TODOs in the code, there's a decision to make about how aggressive we want to be with package caching. Let me know what you think :)
Hey @nvie, how's parenting? :)
I just completed the implementation of pip-compile
and pushed a bunch of changes in the code and tests:
https://github.com/nvie/pip-tools/compare/a9f3b910e6...bb38a98148
I believe pip-compile works pretty great now. What's missing:
Do you have time to review my changes as I push them? I plan to keep working on the missing features in the next couple of days to finally reach a releasable state.
Hey @brutasse, parenting is great, but time-consuming :)
Finding time to work/review stuff for my open source projects is challenging to say the least currently, as we're figuring out a stable rhythm with our kids, work and sleep. Nevertheless, I'll try to review your patch. Feel free to work on more features—I could certainly use the help currently!
Thanksalot, Vincent
Thanks so much for this work, @brutasse, and sorry for not responding to this sooner! I love the work you've put into this. I like the logger and the raising of ConflictError
over assertions. I've fixed a few of the remaining inconsistencies, and a few broken unit tests.
Running the cram test suite seems to have become pretty slow. Might this be due to a change you've made? Or just network stuff? It's still running here (~10 min now).
Thanks for your work!
Um, not sure what's going on with the cram tests. They were running fine last time I checked. Pip-compile seems slower than usual, maybe there are issues with pypi…
Hey @nvie, funny that you mention the project as I was playing with it again yesterday night. I just pushed a small fix, I think this is looking pretty good already. It's fast again (was probably a PyPI issue), here's what happened when I resolved sentry
in a requirements.in
:
amqp==1.0.11
anyjson==0.3.3
BeautifulSoup==3.2.1
billiard==2.7.3.27
celery==3.0.18
cssutils==0.9.10
Django==1.4.5
django==1.5.1
django-celery==3.0.17
django-crispy-forms==1.2.3
django-indexer==0.3.0
django-paging==0.2.4
django-picklefield==0.3.0
django-social-auth==0.7.22
django-social-auth-trello==1.0.3
django-static-compiler==0.3.1
django-templatetag-sugar==0.1
gunicorn==0.17.2
httpagentparser==1.2.2
httplib2==0.8
kombu==2.5.10
logan==0.5.5
nydus==0.10.5
oauth2==1.5.211
Pygments==1.6
pynliner==0.4.0
python-dateutil==1.5
python-openid==2.2.5
pytz==2013b
raven==3.3.3
redis==2.7.2
sentry==5.4.5
setproctitle==1.1.7
simplejson==3.1.3
six==1.3.0
South==0.7.6
Everything is correct, except the duplicate Django
/ django
requirement. Did we decide something about PyPI's case insensitivity? Often private indices are case-sensitive so IMO packages themselves should take care of using the correct letter case.
I think what's left is compiling multiple .in
files into multiple .txt
files.
Oh actually the handling of multiple .in
files is done :)
Thanks for the patches! I'm all for searching case-insensitive. In an ideal world, packages take care of using the correct case, but that's not how packages work in reality. The sentry example above would fail and there would not be a way of using pip-tools in these real situations.
I don't think we should change the casing in the compiled output wherever possible, I think. In case of conflict, just pick one—the first occurrence maybe? It's arbitrary anyway.
Alternatively, we could add a -i
flag to have this behaviour, but I'm generally not a big fan of adding flags like this.
Do you agree?
I think in case of conflict we really need to pick the correct one: it's not completely arbitrary, there is a canonical casing. PyPI and its mirrors are case-insensitive but people who run private mirrors don't necessarily do it with the same software.
With PyPI it's transparent: https://pypi.python.org/simple/webob/
actually redirects to https://pypi.python.org/simple/WebOb/
. With other software it might not be the case. I ran into issues in the past when package could simply not be installed from a custom index where simple/WebOb/
returns a 200 but simple/webob/
a 404…
I submitted a pull request to django-social-auth to fix the sentry issue.
That's even better. Is there a way we can already detect the canonical casing currently, or do we need to add another, explicit, HTTP request for this?
Probably! Since we download the packages we should be able to infer it from the filename directly and even fix incorrect names.
Fixing would be even better—I like :)
So, here is the promised brain dump, sorry for the length.
Right now naively updating requirements can lead to dependency conflicts. For instance, let's say I want to add
raven
to my project but pinned to a specific version:So
raven
needssimplejson
. Now I runpip freeze
and get in myrequirements.txt
:Some time later I run
pip-review
and get (this is not what you'd get right now):Note that the newer simplejson was already available when I initially installed raven, but raven needed
simplejson>=2.3.0,<2.5.0
. Raven 2.0.2 does as well, but this still encourages me to upgrade simplejson when I shouldn't.The current version of raven dropped the
>=2.3.0,<2.5.0
part so now we can get the latest and greatest raven and simplejson safely.My point is that when updating dependencies, checking for conflicts is very hard to do by hand. This needs to be automated with a tool that yells at the developer when an update leads to a version conflict.
Ruby gets this right with Bundler.
gem install bundle
, create aGemfile
with the following content:And run
bundle install
. This installs the required package and its dependencies and creates aGemfile.lock
file:Gemfile.lock
is likerequirements.txt
with pinned versions (not everything is pinned here but should probably be): when creating a new environment and runningbundle install
, bundler looks at the.lock
file to install what's specified.Then there is a bunch of commands that bundle provides. For instance, to list available updates (running this on a bundle created months ago):
Updating
compass-less-plugin
and its dependencies can be done in one command (bundle update compass-less-plugin
) and does so while checking for version conflicts.Sorry if you're already familiar with all this. Now I'll try to explain how we can make improve
requirements.txt
by using this approach.First, instead of putting all the requirements in
requirements.txt
, people would only list first-level deps, pinned. So for raven:Then some tool provided by pip-tools compiles this into the full requirements list, into an other file (like Gemfile and Gemfile.lock but with less noise):
The key point is that this tool builds the whole dependency tree for all the top-level requirements and dumps it as a safely-installable-with-no-conflicts requirements file, which pip can just use.
So next time raven is updated and doesn't require an old simplejson, the tool can update the simplejson requirement. When raven drops simplejson to use python's built-in json implementation, the 2nd-level requirement can be dropped as well, automatically.
Other use case:
requests
which used to have dependencies onoauthlib
,certifi
,chardet
and doesn't anymore (andoauthlib
needed rsa orpyasn1
or whatever). If I just need requests I'll list in my top-level requirements and the tool will pin or drop the dependencies if they're not needed when I upgrade requests itself.And finally, this tool could prevent me from installing package X and Y which need
Z<1.0
andZ>1.1
.That's the theory and I think pip already does some version conflict checks but that's not enough to guarantee safe updates. Now in practice, I think the dependency information is not provided by the PyPI API and requires the whole package to be fetched to actually extract it (or maybe create.io provides that info). So that's annoying but doable, and pip-tools seems like a nice place to experiment with such things.
I think buildout does check for dependency conflicts but I never managed to wrap my head around it.
What do you think? I'm happy to start a proof-of-concept that could be integrated in this project.