jazzband / pip-tools

A set of tools to keep your pinned Python dependencies fresh.
https://pip-tools.rtfd.io
BSD 3-Clause "New" or "Revised" License
7.69k stars 610 forks source link

Ability to Cross Compile #585

Closed cancan101 closed 4 years ago

cancan101 commented 6 years ago

Support the ability to run pip-compile specifying the OS / architecture that should be used for resolving dependencies. Currently it uses the OS where the pip-compile is run. This causes issues such as https://github.com/jazzband/pip-tools/issues/333. It also means that if a package does not exist on the current OS (eg for tensorflow-gpu on MacOS`), then compile fails.

Environment Versions
  1. OS Type MacOS

  2. Python version: $ python -V Python 3.5.3

  3. pip version: $ pip --version pip 9.0.1

  4. pip-tools version: $ pip-compile --version pip-compile, version 1.9.0

    Steps to replicate
  5. Add tensorflow-gpu>=1.2 to requirements.in

  6. pip-compile

Expected result

requirements.txt file with pinned deps. (assuming that --arch manylinux1_x86_64 was set)

Actual result
Could not find a version that matches tensorflow-gpu>=1.2
Tried: 0.12.0rc1, 0.12.0, 0.12.1, 1.0.0, 1.0.1, 1.1.0rc1, 1.1.0rc2, 1.1.0
vphilippon commented 6 years ago

It also means that if a package does not exist on the current OS (eg for tensorflow-gpu on MacOS`), then compile fails.

For the record, it's the responsibility of the package requiring tensorflow-gpu>=1.2 to specify its only a linux/windows dependency if it doesn't exist on MacOS (assuming it supports MacOS itself), and pip-compile would respect that (except in 1.10.0 and 1.10.1, where its broken. Its fixed on master and should be part of 1.10.2, when a release will be possible).

About having the ability to compile for a specific environment, its interesting, but really hard to do well. That likely means having to trick pip to believe its running in a given environment. And then we have the case of sdist packages (.zip, .tar.gz, etc) that need to be built, and could definitely be unbuildable on the current OS (as in, running the setup.py could be impossible on the current OS).

In other word, I wouldn't expect this to be done soon. Contributions are always welcomed, but I would point toward supporting the upcomming pip 10 first :smile:.

taion commented 6 years ago

Yeah, TensorFlow's packaging is a little weird. What this ends up looking like is that we logically want to specify something in requirements.in like:

tensorflow-gpu==1.3.0; 'linux' in sys_platform
tensorflow==1.3.0; 'linux' not in sys_platform

But pip-compile then fails on OS X, because there's no tensorflow-gpu==1.3.0 there.

vphilippon commented 6 years ago

You should be able to do that currently (or something alike, environment markers are allowed in requirements.in. I'm not familiar with OS X: what's its sys_platform value?

taion commented 6 years ago

There is no tensorflow-gpu==1.3.0 at all for OS X on PyPI, so something weird happens. The second line works, though (at least with Pipenv).

From this and other issues on Pipenv, this isn't really addressable without some even more invasive hackery, so this is probably a CANTFIX.

I might poke around at this a bit on my own but it's not immediately obvious that there's a solution here that isn't ridiculously gnarly.

vphilippon commented 6 years ago

But, if pip-compile respects the environment marker here (first line), then it shouldn't try to install that tensorflow-gpu==1.3.0 package on OS X.

pip-tools is supposed to respect the environment markers explicitely given in the requirements.in, so this really strikes me as odd.

Would you give me the pip-compile --rebuild --verbose output of that?

(Am I "fighting" to keep an issue open? I think I need to consult a professional....)

taion commented 6 years ago

Hah, not a problem. Here's what happens:

$ cat requirements.in
tensorflow-gpu==1.3.0; 'linux' in sys_platform
$ pip-compile --version
pip-compile, version 1.10.2
$ pip-compile --rebuild --verbose
Using indexes:
  https://pypi.python.org/simple

                          ROUND 1
Current constraints:
  tensorflow-gpu==1.3.0

Finding the best candidates:
  found candidate tensorflow-gpu==1.3.0 (constraint was ==1.3.0)

Finding secondary dependencies:
  tensorflow-gpu==1.3.0 not in cache, need to check index
Could not find a version that satisfies the requirement tensorflow-gpu==1.3.0 (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0)
Traceback (most recent call last):
  File "/usr/local/bin/pip-compile", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/piptools/scripts/compile.py", line 184, in cli
    results = resolver.resolve(max_rounds=max_rounds)
  File "/usr/local/lib/python3.6/site-packages/piptools/resolver.py", line 102, in resolve
    has_changed, best_matches = self._resolve_one_round()
  File "/usr/local/lib/python3.6/site-packages/piptools/resolver.py", line 199, in _resolve_one_round
    for dep in self._iter_dependencies(best_match):
  File "/usr/local/lib/python3.6/site-packages/piptools/resolver.py", line 285, in _iter_dependencies
    dependencies = self.repository.get_dependencies(ireq)
  File "/usr/local/lib/python3.6/site-packages/piptools/repositories/pypi.py", line 152, in get_dependencies
    self._dependencies_cache[ireq] = reqset._prepare_file(self.finder, ireq)
  File "/usr/local/lib/python3.6/site-packages/pip/req/req_set.py", line 554, in _prepare_file
    require_hashes
  File "/usr/local/lib/python3.6/site-packages/pip/req/req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/local/lib/python3.6/site-packages/pip/index.py", line 514, in find_requirement
    'No matching distribution found for %s' % req
pip.exceptions.DistributionNotFound: No matching distribution found for tensorflow-gpu==1.3.0

pip-tools can deal with the package itself just fine, but it fails when it tries to grab the package to resolve dependencies.

taion commented 6 years ago

It's the same sort of problem as https://github.com/kennethreitz/pipenv/issues/857, though the same problems there don't come up given that pip-tools itself runs in the virtualenv rather than outside of it.

One mitigation in this case could be that, for packages that do upload their dependencies to PyPI (are these the packages that use twine?), we just use the stated dependencies from PyPI rather than download the package to resolve it.

This wouldn't solve the problem in full generality, but it would fix things for e.g. tensorflow-gpu. This would also fix @mpolden's specific problem in https://github.com/kennethreitz/pipenv/issues/857, actually, since APScheduler does in fact publish its install requirements to PyPI, though again it wouldn't fix the general case.

Though frankly Pipenv is a bit of a no-go for us anyway due to https://github.com/kennethreitz/pipenv/issues/966.

vphilippon commented 6 years ago

My bad, the environment markers are simply copied to the resulting requirements.txt. It looks like it will still do the lookup and fail here. I have a hunch of how this could be fixed, maybe it wouldn't be so hard (famous last words) in our case. Although, don't hold your breath.

I would need to check if PyPi actually provide an API to get those dependencies, but I doubt it.

taion commented 6 years ago

It's on the JSON payload. See info.requires_dist:

taion commented 6 years ago

I'm not sure if this API really lets you distinguish between "no dependencies" and "dependencies not published to PyPI", though. Maybe not that important in practice.

vphilippon commented 6 years ago

Note to self: stop making "guesses" past 1:00 AM. Thank you for the info, its good to know, maybe we can make something out of this.

taion commented 6 years ago

Ah, I see it's not so straightforward in the code given how you hook into RequirementSet from Pip to do the lookup.

flaub commented 6 years ago

Actually, there's an evaluate() method on the markers attribute of an InstallRequirement. I don't know the best place for this call to be made, but my best guess is in scripts/compile.py you could add a line like:

constraints = [x for x in constraints if not x.markers or x.markers.evaluate()]

This line could go just after collecting all the constraints for parsing requirements and just before Resolver.check_constraints(constraints)

Here? https://github.com/jazzband/pip-tools/blob/b6a9f1fb3423dd189f050fac31ac9e47b05178e8/piptools/scripts/compile.py#L180

Additionally, the evaluate() method takes an environment arg, which presumably means that a command-line arg to pip-compile could be used to specify the target environment (I don't know exactly what form the environment arg takes at this time).

taion commented 6 years ago

I think there's still the problem that we literally can't evaluate the transitive dependencies for a package that we can't install/download, though.

The bottleneck here isn't really the evaluation – it's that unless we try to read the deps from the PyPI API (instead of using pip's approach), we don't have a way to get transitive deps at all for non-installable packages.

flaub commented 6 years ago

No, check this out. Say I have a requirements.txt like such:

cairocffi
editdistance
h5py>=2.7.0
keras==2.0.8
pillow; platform_machine == 'armv7l'
pillow-simd; platform_machine != 'armv7l'
requests==2.18.4
scikit-learn[alldeps]
sklearn
tensorflow-gpu==1.3.0; platform_machine != 'armv7l' and platform_system != 'Darwin'
theano

Now let's say I use the following code snippet (which is pieced together from a REPL session and basically emulates what pip-compile is doing):

import optparse

import pip
from pip.req import parse_requirements

from piptools.repositories.pypi import PyPIRepository
from piptools.resolver import Resolver

class PipCommand(pip.basecommand.Command):
    name = 'PipCommand'

def main():
    pip_command = get_pip_command()
    pip_args = []
    pip_options, _ = pip_command.parse_args(pip_args)

    session = pip_command._build_session(pip_options)
    repository = PyPIRepository(pip_options, session)

    constraints = list(
        parse_requirements(
            'requirements.txt',
            finder=repository.finder,
            session=repository.session,
            options=pip_options))

    Resolver.check_constraints(constraints)
    resolver = Resolver(constraints, repository)
    results = resolver.resolve()

    import pprint
    pprint.pprint(results)

def get_pip_command():
    # Use pip's parser for pip.conf management and defaults.
    # General options (find_links, index_url, extra_index_url, trusted_host,
    # and pre) are defered to pip.
    pip_command = PipCommand()
    index_opts = pip.cmdoptions.make_option_group(
        pip.cmdoptions.index_group,
        pip_command.parser,
    )
    pip_command.parser.insert_option_group(0, index_opts)
    pip_command.parser.add_option(optparse.Option('--pre', action='store_true', default=False))

    return pip_command

if __name__ == '__main__':
    main()

If you run this, you get the exact same error as reported. However, if we now filter out the constraints that don't match the specified markers in requirements.txt, the resolver is happy (and so is the respository). This is accomplished with this, just before the call to Resolver.check_constraints(constraints):

constraints = [x for x in constraints if not x.markers or x.markers.evaluate()]

We are telling pip-compile to honor the markers specified in the top-level requirements passed in. This doesn't solve any markers on transitive dependencies that might not match the platform, but that doesn't matter if the top-level ones are properly specified.

taion commented 6 years ago

Oh, hey, scikit-learn[alldeps]! Adding that was probably among the least favorite PRs I've ever made 🤣

So, that does work, but it's not exactly what I want. Ideally, I'd like for this package (and its exclusive dependencies) to show up in my generated requirements.txt, with the appropriate markers.

Imagine I started with:

tensorflow-gpu; 'linux' in sys_platform
tensorflow; 'linux' not in sys_platform

I'd want something like:

bleach==1.5.0             # via bleach, tensorflow-tensorboard
enum34==1.1.6             # via enum34, tensorflow
html5lib==0.9999999       # via bleach, html5lib, tensorflow-tensorboard
markdown==2.6.9           # via markdown, tensorflow-tensorboard
numpy==1.13.3             # via numpy, tensorflow, tensorflow-tensorboard
protobuf==3.5.0.post1     # via protobuf, tensorflow, tensorflow-tensorboard
six==1.11.0               # via bleach, html5lib, protobuf, six, tensorflow, tensorflow-tensorboard
tensorflow-gpu==1.4.0; 'linux' in sys_platform
tensorflow-tensorboard==0.4.0rc3  # via tensorflow, tensorflow-tensorboard
tensorflow==1.4.0; 'linux' not in sys_platform
werkzeug==0.12.2          # via tensorflow-tensorboard, werkzeug
wheel==0.30.0             # via tensorflow, tensorflow-tensorboard, wheel

For carrying through dependencies transitively, suppose I had:

six
tensorflow; 'linux' not in sys_platform

Then I would want something like:

bleach==1.5.0; 'linux' not in sys_platform
enum34==1.1.6; 'linux' not in sys_platform
html5lib==0.9999999; 'linux' not in sys_platform
markdown==2.6.9; 'linux' not in sys_platform
numpy==1.13.3; 'linux' not in sys_platform
protobuf==3.5.0.post1; 'linux' not in sys_platform
six==1.11.0
tensorflow-tensorboard==0.4.0rc3; 'linux' not in sys_platform
tensorflow==1.4.0; 'linux' not in sys_platform
werkzeug==0.12.2; 'linux' not in sys_platform
wheel==0.30.0; 'linux' not in sys_platform
flaub commented 6 years ago

I see, sorry for the rat hole, carry on :)

vphilippon commented 6 years ago

Proper environment marker handling from the requirements.in was added in 2.0.0, and I'm currently documenting the "official" stance of pip-tools regarding cross-environment usage.

In short, pip-compile must be executed for each environment. We have the same issues described in this article about pypi regarding the execution of setup.py. We cannot safely and consistently know the dependencies required for a linux installation while on a windows installation, as an example.

So in the current state of things, it's a dead end. If someday there's a deterministic way to know the dependencies of any package without ever having to execute possibly environment-dependent code, then it'll be doable.

altendky commented 5 years ago

I decided to solve this for myself by just dumping the locking into Azure Pipelines and keeping a per-platform requirements.txt output. I also happen to have multiple groupings (base, testing, dev, for example). boots takes care of the pip-syncing from the proper platform/group file and also delegates the remote locking to romp which basically allows arbitrary execution in Azure without building and committing and pushing a custom CI config.

Obviously it would be nice for packages to be processable on all platforms but I decided not to wait for that to happen.

https://github.com/altendky/boots https://github.com/altendky/romp

karypid commented 4 years ago

Hi all.

I came across this issue while looking for info on how to use pip-tools across mac/win32/linux. I started following the approach of running pip-compile on each platform and maintaining separate .txt files, for example:

pip-compile --allow-unsafe --upgrade --build-isolation --generate-hashes --output-file .\requirements\win32-py3.7-main.txt .\requirements\main.in

and

pip-compile --allow-unsafe --upgrade --build-isolation --generate-hashes --output-file .\requirements\linux-py3.7-main.txt .\requirements\main.in

What is the suggestion on compiling an add-on dev.in and constraining it to the set of main.in requirements for the applicable platform. I am now forced to use multiple files, as in:

# This is: linux-py3.7-dev.in:
-c linux-py3.7-main.txt
pylint

# This is: win32-py3.7-dev.in:
-c win32-py3.7-main.txt
pylint

I have resorted to having a single dev.in file WITHOUT the -c {platform}-{python_ver}-main.txt line, and using a script that detects the running platform and creates a temporary file (linux-py3.7-dev.in/win32-py3.7-dev.in/...) which contains the appropriate line for referencing the proper main.txt file.

Any suggestions/plans on how to best approach this?

AndydeCleyre commented 4 years ago

I'm inviting anyone still interested in multi-environment compilation and sync workflows to pick up discussion @ #826.

cheind commented 2 years ago

@karypid I'm using the following approach to cross-platform compatible requirements files with constraint support:

Assuming the following requirements files

# requirements.in
django

# dev-requirements.in
-c {python}-{platform}-{machine}-requirements.txt
django-debug-toolbar

(Note special constraint syntax).

I then invoke this (my) platform-generate.py script on each platform as follows

python platform_generate.py requirements.in dev-requirements.in

to get platform specific .in files, e.g.

py3.9-linux-x86_64-requirements.in
py3.9-linux-x86_64-dev-requirements.in

Inspecting py3.9-linux-x86_64-dev-requirements.in reveals

-c py3.9-linux-x86_64-requirements.txt
django-debug-toolbar

From here on use pip-compile/pip-sync. Alternatively platform-generate.py also supports a --compile switch to automatically call pip-compile once the platform specific .in files are generated.

asparagusbeef commented 1 month ago

We are developing on Windows but deploying in a docker linux environment. My solution was to create a container that imitates our production environment, mount it locally, and generate the requirements.txt in it. Basically: Dockerfile:

FROM python:3.11-slim-bullseye

RUN pip install pip-tools

WORKDIR /app

COPY requirements.in /app/

CMD ["pip-compile", "--output-file=requirements.txt", "--strip-extras", "requirements.in"]

Makefile:

.PHONY: dev compile

dev:
    pip-compile --output-file=requirements-dev.txt --strip-extras requirements-dev.in requirements.in && \
    pip-sync requirements-dev.txt && \
    black . && \
    isort . --profile black

compile:
    docker build -t pip-compile-env -f ../../setup/Dockerfile.compile .
    powershell -command "docker run --rm -v \"$$(Get-Location):/app\" pip-compile-env"
    docker rmi pip-compile-env
    @echo "requirements.txt has been generated in the current directory."

To prevent the image from building every time:

compile:
    @powershell -Command "if (-Not (docker images -q pip-compile-env)) { \
        Write-Output 'Image pip-compile-env not found. Building...'; \
        docker build -t pip-compile-env -f ../../setup/Dockerfile.compile .; \
    } else { \
        Write-Output 'Image pip-compile-env already exists. Skipping build...'; \
    }"
    @powershell -command "docker run --rm -v \"$$(Get-Location):/app\" pip-compile-env"
    @echo "requirements.txt has been generated in the current directory."