thought-machine / please

High-performance extensible build system for reproducible multi-language builds.
https://please.build
Apache License 2.0
2.45k stars 203 forks source link

Remove package prefix in `pex` file #1891

Closed bmustiata closed 2 years ago

bmustiata commented 3 years ago

I'm trying to create a python build that needs a library build in the same project.

The library build in the same project resides in //tools/termcolor-util:termcolor_util This library in turn uses termcolor. As per recommended best-practices termcolor lives in //thirdparty/python:termcolor as a pip_library.

The BUILD definition is:

python_library(
    name="termcolor_util",
    srcs=glob(["**/termcolor_util/__init__.py"]),
    deps=["//thirdparty/python:termcolor"],
)

python_binary(
    name="termcolor_util_main",
    deps=[":termcolor_util"],
    main="termcolor_util/__main__.py",
)

However, when I try to run it, I get an error, namely that the library is not found:

$ plz run //tools/termcolor-util:termcolor_util_main
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "plz-out/bin/tools/termcolor-util/termcolor_util_main.pex/__main__.py", line 385, in <module>
  File "plz-out/bin/tools/termcolor-util/termcolor_util_main.pex/__main__.py", line 377, in run
  File "plz-out/bin/tools/termcolor-util/termcolor_util_main.pex/__main__.py", line 351, in interact
  File "plz-out/bin/tools/termcolor-util/termcolor_util_main.pex/__main__.py", line 360, in main
  File "/usr/lib/python3.8/runpy.py", line 210, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/raptor/projects/germanium-monorepo/plz-out/bin/tools/termcolor-util/termcolor_util_main.pex/tools/termcolor-util/termcolor_util/__main__.py", line 1, in <module>
ModuleNotFoundError: No module named 'termcolor_util'

Looking at the pez file I can see why:

unzip -l /home/raptor/projects/germanium-monorepo/plz-out/bin/tools/termcolor-util/termcolor_util_main.pex
Archive:  /home/raptor/projects/germanium-monorepo/plz-out/bin/tools/termcolor-util/termcolor_util_main.pex
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2001-01-01 00:00   thirdparty/
        0  2001-01-01 00:00   thirdparty/python/
     3997  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/PKG-INFO
      164  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/SOURCES.txt
        1  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/dependency_links.txt
       72  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/installed-files.txt
       10  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/top_level.txt
     5044  2001-01-01 00:00   thirdparty/python/termcolor.py
        0  2001-01-01 00:00   tools/
        0  2001-01-01 00:00   tools/termcolor-util/
        0  2001-01-01 00:00   tools/termcolor-util/termcolor_util/
      456  2001-01-01 00:00   tools/termcolor-util/termcolor_util/__main__.py
      501  2001-01-01 00:00   tools/termcolor-util/termcolor_util/__main__.pyc
     1900  2001-01-01 00:00   tools/termcolor-util/termcolor_util/__init__.py
     1904  2001-01-01 00:00   tools/termcolor-util/termcolor_util/__init__.pyc
    14007  2001-01-01 00:00   __main__.py
        0  2001-01-01 00:00   tools/termcolor-util/termcolor_util/
        0  2001-01-01 00:00   thirdparty/__init__.py
        0  2001-01-01 00:00   thirdparty/python/__init__.py
        0  2001-01-01 00:00   thirdparty/python/termcolor-1.1.0.egg-info/__init__.py
        0  2001-01-01 00:00   tools/__init__.py
        0  2001-01-01 00:00   tools/termcolor-util/__init__.py
---------                     -------
    28056                     22 files

In the .plzconfig I have:

[python]
moduledir = thirdparty.python

So here are the questions:

  1. How can I convince the python_library to drop the prefixes? Since the pex files are inputs only across my targets, but packages using the same source code will need to be published on pypi, I don't understand why the prefixing would be necessary with the same hierarchy inside in the first place. Furthermore, since the output is the pex file, the outcome of the actions in the graph, shouldn't that be enough, or am I missing something obvious?
  2. Is it possible to have the moduledir from inside the .plzconfig as a list? I tried , and : as separators, but to no avail. Also it's weird since tools/termcolor-util is not a valid python package name, but the . notation in the moduledir seems to indicate it's not a regular PYTHON_PATH variable, but a python module, which would make the path bundled in the pex completely inaccessible. To add to the confusion, it isn't even looking like a python module, since the init.py files in the parent folders are missing.
bmustiata commented 3 years ago

Ok, so after investigating, it seems both options are not possible. Would it be ok if I'd implement a .plzconfig boolean flag that would enable the stripping of the package name when writing the data into pex?

Tatskaari commented 3 years ago

Sorry for the slow response on this one. Seems it slipped past getting triaged correctly in our standups.

The module import path is for stripping the import path for third party libraries. If we look in the Please repo, we have a test that imports numpy:

# //test/python_rules:numpy_test
python_test(
    name = "numpy_test",
    srcs = ["numpy_test.py"],
    labels = [
        "py3",
        "pip",
    ],
    deps = ["//third_party/python:numpy"],
)

When looking inside the .pex you will see we have the third party path still in place:

        0  2001-01-01 00:00   third_party/
        0  2001-01-01 00:00   third_party/python/
    45692  2001-01-01 00:00   third_party/python/numpy/LICENSE.txt
     1646  2001-01-01 00:00   third_party/python/numpy/__config__.py
     8858  2001-01-01 00:00   third_party/python/numpy/__init__.py
      331  2001-01-01 00:00   third_party/python/numpy/_distributor_init.py

However because we have set the python import path in the .plzconfig:

[python]
moduledir = third_party.python

You can import it as your normally would:

class NumpyTest(unittest.TestCase):

    def test_numpy_is_importable(self):
        import numpy # Note this isn't third_party.python.numpy
        self.assertIsNotNone(numpy.nan)

The issue you have seems to be with importing your own code. You seem to be failing to import termcolor_util which lives outside your third party folder? Do you just need to import it as tools.termcolor-util.termcolor_util or similar?

Or are you asking to be able to import it as simply termcolor_util? Perhaps it's reasonable to want to strip a common prefix from all python imports e.g. if you have //path/to/my/service/common/python/termcolorutils, you might want to just import that as common.python.termcolorutils when working under //path/to/my/service/...

Tatskaari commented 3 years ago

Ah I see, you're trying to produce a module that you can publish to pypi so it's not acceptable to have the import path be the entire path to the sources. The simplest solution would be to move your sources to the root of your project but I think I can see the motivation here.

bmustiata commented 3 years ago

Yeah, that can't work, since the whole point is to have multiple packages that are being published independently.

Initially, I was thinking into extending the moduledir so it would support a list of paths, not a single path like it's implemented now, however, I quickly realized that has the side effect of polluting each other project since the python binaries launcher get rendered with the moduledir path baked in.

So the only reasonable solution to me seems to be an option to strip the package names.

However, due to some other reasons as well, I decided to use something else, so I'm closing this issue.

bmustiata commented 3 years ago

Turns out the "something else" - Bazel - isn't working either, so let's try again :).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.

bmustiata commented 2 years ago

The PR is passing all the tests. Could someone please review it, and/or merge it?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.