Closed jaraco closed 1 year ago
The current implementation achieves the goals by soliciting special variables from the user. This feature was inspired by the setuptools technique from 15 years ago, in which requirements were embedded into the console_scripts
generated by setuptools
and then "required" at runtime by pkg_resources.
In this approach, the user specifies their requirements in the script using Python syntax and special variable names exposing static values. Example:
#!/usr/bin/env python
# Run this with pip run
__requires__ = ['pymongo', 'requests>2']
__index_url__ = 'http://my.corp/index/'
import pymongo
import requests
...
The tool parses the Python and extracts the requirement specifications from the declared variables and incorporates those during the install phase.
__main__.__requires__
).globals()
/__main__
.,
, []
) than a simple list.To limit polluting the namespace, the tool could solicit all possible values through a single variable, something like:
__pip_run__ = dict(
requirements=[...],
index_url = '...',
)
This approach might prove easier to extend. It also encapsulates the values into a namespace, avoiding proliferation of dunder-names. It is probably more tedious to use, though.
This approach re-uses the requirements.txt syntax but as a multi-line comment in the script. Example:
#!/usr/bin/env python
# Run this with pip run
# requirements:
# -i http://my.corp/index
# pymongo
# requests >= 2
import pymongo
import requests
...
Some features of requirements.txt
files might be excluded. The tool will parse out the values and will execute them as if they had been passed as their equivalent options on the command-line.
requirements.txt
support.requirements.txt
syntax.requirements.txt
.#
and space) to every line.-r
Instead of parsing the requirements.txt, the tool could save the contents of this comment to a temporary file and install that using -r
. This approach would limit the work to parse and interpret the file's contents, but would have the unfortunate consequence of not honoring already installed packages.
The tool could solicit arbitrary arguments to the pip install command
, either in a shebang header or with some specialized syntax. For example:
#!/usr/bin/env python
# Run this with pip run
# pip run install args:
# pymongo "requests>=2" --index-url=http://my.corp/index
shlex
would be used.Instead of including the command-line options in the Python portion of the file, including it in the Unix shebang line.
#!/usr/bin/env python -m pip run pymongo requests>=2 --index-url=http://my.corp/index --
>
characters and definitely doesn't support spaces (i.e. requests >= 2
.It may be possible/desirable to constrain the allowed arguments select ones, but that of course requires maintaining that constraint.
Much like in Option 1, the command-line options could be solicited through a single variable:
__pip_run_args__ = ['pymongo', 'requests>=2', '--index-url=http://my.corp/index']
This approach would simplify the parsing of arguments, but has many of the same cons as option 1 but doesn't provide a structured syntax for soliciting the values. It compromises both the intuition of the user (who's thinking "command line parameters") and the tool execution.
I'm not a fan of the magic variable syntax, so I guess I'll toss my own proposal in.
This approach is as Option 2, but only allows PEP 440 specifiers. In addition, a single-line syntax is provided for brevity.
The Requires:
line MUST appear at the top of the file, before any import statements.
Index: URL
?)Single line example:
#! /usr/bin/env python3
# Generate visualization
# Requires: numpy pandas~=1.0 seaborn~=0.10
Multiline example:
#! /usr/bin/env python3
# Generate visualization
# Requires:
# numpy~=1.18
# pandas~=1.0
# seaborn~=0.10
Overall ratings:
Option 1: -0.5
/ mildly against
Option 2: +0.5
/ lukewarm, but positive
Option 3: -1.0
/ strongly against
Option 4: +1.0
/ strongly for
Anyway, these are my 2¢ for Proposals 1-3:
-i https://corp.invalid/index
?And another comment for thought: what about the Notebook use case? This would be very useful for notebook distribution.
My preference is for option 4. I don't see any value in supporting the "full" requirements syntax (which is arcane, and not standardised). Being able to specify a list of projects the script depends on, possibly with a version restriction is IMO sufficient.
I'm not too concerned about the precise syntax. Multi-line, single-line or both doesn't bother me, and allowing the requirements to go after the imports isn't the end of the world either (I can see arguments for putting them at the end of the file). But anything works. I'm more concerned with semantics - that the data looks like a comment to Python, and is treated as a list of (project plus optional PEP 440 version specifier) values by pip-run
.
I'm not clear why being able to specify an alternative index is important. In my view, "what index to use" is a property of the environment, not of the script, so it doesn't actually belong in the script metadata. There's also complex questions about whether to allow --extra-index-url
as well as --index-url
, which I think add confusion but little practical value. I'd prefer to omit this, and leave it as a command line option.
The same argument (it's a property of the environment, not the script) also applies to the various suggestions to let the script specify other pip-run
or pip
command line arguments.
To summarise my objections to the other options:
Option 1: I don't like "magic variables", and as you say users will expect to be able to build the value at runtime. Apart from this, it's the best of the options I rejected. Option 2: Full requirements file syntax is a mess, and limiting it will be confusing for users. Also a maintenance problem, as you'll need to track any changes pip make to the format. Option 3: Trying to parse arguments into an argv list is highly fragile, and there are too many pip arguments that we would not want to allow here.
I've not commented on any of the variants. None of them have any significant impact on my overall opinion of the options.
I'm looking at adding a similar feature to pipx - see https://github.com/pypa/pipx/issues/913.
My proposal there is
A block of comment lines in the source code. The first line must be
# Requirements:
and subsequent lines must be a hash, whitespace, and a requirement specifier, as defined by PEP 508. The requirements block is terminated by a blank line (or probably any line that doesn't start with a hash, I see little point in rejecting something "obviously" valid just because it misses out a blank line).
This is essentially option 4, with the following differences:
pipx
doesn't need an "Index:" section or anything similar, as it has command line arguments to handle that.
Using a common format would obviously be beneficial. But that can be fixed after the fact, so I'm only mentioning this for context (although if pip-run
is able to use the same syntax as I'm adding to pipx
, that's probably less work all round 🙂)
- What use cases are there for introspection?
The original case is that pkg_resources inspects __main__.__requires__
to activate any inactive (non-default) packages (mainly in support of environments where multiple versions were allowed to be installed, a behavior that's currently discouraged).
But I can imagine other scenarios where introspection could be useful:
--verbose
invocation or debugging informationtests for the script (including doctests) may make assertions about the requirements. For example, a script might add this doctest:
"""
This script must have no requirements.
>>> __requires__
[]
This script must not depend on lxml.
>>> any('lxml' in req for req in __requires__)
False
"""
- Is it necessary to have allow
-i https://corp.invalid/index
?I'm not clear why being able to specify an alternative index is important. In my view, "what index to use" is a property of the environment, not of the script, so it doesn't actually belong in the script metadata.
The use-case that motivated this behavior was I wished to satisfy the need where the script itself has dependencies it knows cannot be satisfied from PyPI and wishes to communicate that in its requirements. It's for the same reason that pip supports -i
in a requirements.txt file. It's not always the case that the index is a property of the environment. In some cases (like a corporate environment), it may be the case that the index is used universally for all pip operations and so can be configured in the environment. And there are other cases where the user should select the index at invocation time. But there are uncontrived cases where the script itself is the most appropriate factor in the consideration of the index.
Consider for example a repro that demonstrates a fix with privately-published packages:
__requires__ = ['cherrypy', 'cheroot==6.5.6.dev28+g9413ed9c']
__index_url__ = 'https://m.devpi.net/jaraco/dev'
# ... demonstrate some behavior only present in cheroot 6.5.6.dev28+g9413ed9c
It violates a key principle of pip-run, that invocations should be one-liners or they should be executed as a script. pip-run is trying to get away from the multi-step instructions required for invocation.
- Is the command line syntax really that familiar?
Yes. The syntax comes directly from pip (and is passed directly to pip), likely the most used command-line in Python.
what about the Notebook use case?
The notebook use-case is implemented (#45, example).
Of course, the notebook use-case could parse # requirements comments
instead of using Python syntax, although Notebook authors might not want comments but would prefer a text block to declare requirements if they're not Python.
Since there's some uncertainty about the future of pip-run, here's what I plan to do - I'll first implement option 1+4, so either format is supported. That way there's compatibility with the proposed pipx format as well as backward compatibility for the existing format. This dual-mode will give the formats a chance to compete for mind share and after some time perhaps one can be deprecated and retired.
Background
In pypa/pip#3971, key members of the PyPA have leveled critiques of the embedded requirements feature, where a user can supply requirements and other common installer directives to signal to the tool how (best) to run the script. This feature emerged from the use-case where a script author would like to distribute a single file and have that file be runnable by any number of end users with minimal guidance, allowing the file to be published to a gist or alongside other scripts in a directory without needing additional context for executability.
Goals
--extra-index-url
or--quiet
.