pfmoore / editables

MIT License
40 stars 11 forks source link

Advice about import hook limitation #21

Open abravalheri opened 2 years ago

abravalheri commented 2 years ago

Hi Paul, thank you very much for the project.

It was really important for me as a reference when implementing PEP 660 support in setuptools. I still plan to contribute to editables in terms of namespace handling once the details are flashed out (this may depend on the instructions we get in https://github.com/python/cpython/issues/92054).

I am opening this issue to seek for advice/have a general discussion about using import hooks in editable installs.

Recently in the setuptools repository we got an issue reporting that the editable installation of a regular package (ie. non-namespace) don't behave in the same way as a normal installation. I understand that editables would also have the same problem. So I wanted to share with you this particular use case and check what are your thoughts.

Problem description

Let's imagine a monorepo that contains multiple directories, each one of them corresponding to a Python project, like the following:

monorepo
├── pkgA
│   ├── pkgA
│   │   └── __init__.py
│   └── pyproject.toml
└── pkgB
    ├── pkgB
    │   └── __init__.py
    └── pyproject.toml

When the users perform a regular installation of one sub-directory, they are able to run scripts importing the installed package; even when the "current working directory (CWD)" is the root of the monorepo. This is demonstrated in the snippet bellow:

# --- Simulate monorepo project ---
rm -rf /tmp/monorepo
mkdir -p /tmp/monorepo/pkgA/pkgA
mkdir -p /tmp/monorepo/pkgB/pkgB
cd /tmp/monorepo

touch pkgA/pyproject.toml  pkgB/pyproject.toml  
echo "x = 42" > pkgA/pkgA/__init__.py
echo "y = 42" > pkgB/pkgB/__init__.py

# --- Simulate regular installation ---
python3.8 -m venv .venv
cp -r pkgA/pkgA .venv/lib/python3.8/site-packages/
tree .venv/lib/python3.8/site-packages/pkgA
# .venv/lib/python3.8/site-packages/pkgA
# └── __init__.py

# --- Experiment with regular installation ---
.venv/bin/python -c 'from pkgA import x; print(x)'   # <------ this works fine
# 42

.venv/bin/python -c 'import pkgA; print(pkgA)'   # <------ this works fine
# <module 'pkgA' from '/tmp/monorepo/.venv/lib/python3.8/site-packages/pkgA/__init__.py'>

In normal conditions, "stuff" in the CWD should take precedence over "stuff" in site-packages. However, this rule does not seem to apply for namespace packages in CWD.

If I had to make a wild guess, I would say that the following is happening:

  1. the import machinery starts by creating a ModuleSpec for the namespace in CWD
  2. half-way through, the import machinery scans site-packages and "changes opinion"
  3. the import machinery replaces the initially found namespace ModuleSpec with a new (regular) one pointing to the directory with __init__.py.

When using a import hook/meta path finder, the behaviour differs, and the users are surprised with an error:

# --- Simulate editable installation ---
.venv/bin/python -m pip install -U 'editables==0.3'

rm -rf .venv/lib/python3.8/site-packages/pkgA
echo 'import _editable_impl_pkga' > .venv/lib/python3.8/site-packages/pkga.pth
cat <<EOF > .venv/lib/python3.8/site-packages/_editable_impl_pkga.py
from editables.redirector import RedirectingFinder as F
F.install()
F.map_module('pkgA', '/tmp/monorepo/pkgA/pkgA/__init__.py')
EOF

# --- Experiment with editable installation ---
.venv/bin/python -c 'from pkgA import x; print(x)'   # <------ exception
# Traceback (most recent call last):
#   File "<string>", line 1, in <module>
# ImportError: cannot import name 'x' from 'pkgA' (unknown location)

.venv/bin/python -c 'import pkgA; print(pkgA)'   # <------ exception
# <module 'pkgA' (namespace)>

In this scenario the "accidental" namespace package created by having "" on sys.path takes precedence over the import hook. Since the objective of an editable install is to operate as close as possible to a regular install, this use case could be used as a counter argument for import hooks.

pfmoore commented 2 years ago

Hi @abravalheri

I actually think this is possibly simpler than you suggest - it looks basically like a simple case of having two different pkgA modules on sys.path - one the "real" module, the other the "accidental" namespace module created by using the package name as the name of the project directory. Which takes precendence is down to which one gets "found" first, which is essentially arbitrary.

I'd be tempted to say the simple solution here is "don't name your project directory the same as the package name in a monorepo". Does that fix the issue? If it doesn't, I'd be very curious as to why.

As regards this being a counter-argument for import hooks, I think it is (in the sense that it's a case where import hooks appear to have unexpected edge cases). I don't personally think that's a disaster, as I consider the .pth approach as the "canonical" way of implementing editable installs, with other approaches still being experimental, and potentially having bugs. I know that the .pth file approach also has limitations, but my view is that at least those limitations are well known, and known from experience to be tolerable.

I'd be happy to improve the hook-based approach, but I expect it will be a long time before I'd consider making it the recommended approach. I know I'm holding it to a higher standard than .pth files, but I'm personally OK with that.

abravalheri commented 2 years ago

Thank you very much @pfmoore for the insights. I understand your view and the conclusions seem very reasonable.


At some point I want to experiment a bit more with importlib.abc.PathEntryFinder rather than importlib.abc.MetaPathFinder. I have the impression that PathEntryFinders are less subject to the namespace-related limitations than MetaPathFinders. Maybe I can use them to solve this other limitation too. I will keep you in the loop :)

pfmoore commented 2 years ago

Reopening as a reminder for myself. I do want to investigate this scenario more closely, and I think it's useful to have a discussion about import hooks and their limitations.

I think the key points for me at the moment are:

  1. PEP 420 implicit namespace packages are still an issue for hooks. That's being tracked on the cpython repo, but for current versions of Python any solution requires a level of digging into internals that not everyone may be comfortable with.
  2. Static type checker support. This has been mentioned on typing-sig, but static tools can't easily (if at all) detect modules exposed via import hooks. IMO the most plausible solution for this is to invent static alternatives for common patterns that currently use import hooks (as opposed to requiring hook authors to expose static information as well), but that's just my opinion, and I'm not actively involved in that conversation.
  3. The issue exposed here, that import hooks typically appear in a different place in the import resolution order than the standard sys.path locations, meaning that it's not easy to control shadowing in the case where the same name appears in more than one place in the import path. (This is exacerbated by the fact that implicit namespace packages mean that every directory is potentially a package).

To be perfectly honest, I think that some of these issues are inherent in the design of the existing path hook mechanism[^1], and we're only hitting them now because outside of zipimport (which uses a path hook, not a meta hook) there's very little use of import hooks, so the edge cases haven't got exposed until now.

I wonder whether there is merit in looking at a mechanism similar to the existing .pth file approach (but ideally implemented in the import machinery, not in site.py) that allows declaratively exposing a much richer set of "overlays" on sys.path. Something like the "virtual wheel" file idea from PEP 662, but implemented directly by the import machinery[^2], so that packaging tools just install the description file (and static analysis tools could use that file directly).

[^1]: And I say this as one of the authors of PEP 302 that introduced the model. [^2]: I say this with some reservations, as a lot of the reasons I rejected PEP 662 could still apply in that situation.

abravalheri commented 2 years ago

I wonder whether there is merit in looking at a mechanism similar to the existing .pth file approach (but ideally implemented in the import machinery, not in site.py) that allows declaratively exposing a much richer set of "overlays" on sys.path. Something like the "virtual wheel" file idea from PEP 662, but implemented directly by the import machinery2, so that packaging tools just install the description file (and static analysis tools could use that file directly).

The investigation that I had in mind was in this direction.

I was thinking about implementing a path hook that mimics the zipimport mechanism. I can imagine a __editable__.<package-name>.json file being added to site-packages containing metadata to describe the contents of the package. This metadata could be used at the same time by an import hook and by static code analysis tools.

My hopes are that by using a PathEntryFinder instead of a MetaPathFinder we could surpass limitations 1 and 3. I was also hopping that the existence of a file containing a static description of the contents of the package would satisfy static analysis tools that cannot afford the dynamic nature of import hooks.

pfmoore commented 2 years ago

You may still have issues regarding (3), because the json file would need to be added to sys.path somewhere. To accurately match the "historical" behaviour here, maybe add an __editable__.<package-name>.pth file containing the name of the .json file alongside the json file itself. Then the .pth file will be processed "as normal" (i.e., "in the same way as the old setup.py develop command worked") and add the .json file to sys.path.

But yes, I quite like this idea. And the fact that both you and I came up with something similar suggests that it has potential 🙂

I don't have much time at the moment to work on this, but yes, please, let me know how you get on (and if you need help, feel free to ask).

pfmoore commented 1 year ago

I've been looking at this issue again. I'm trying to understand what the actual problem is with namespace packages. The root of my confusion is that basically, a top level non-namespace package is (in effect) "owned" by a single project. So you can't have two projects "foo" and "bar" that both own the top-level name "foobar". Or rather, you can, but the packaging machinery doesn't protect you from any mess that you get in (you can install them both, but what you get is a mess with the two projects overwriting each other).

With that in mind, I'm embarrassed to admit that it's possible the "namespace issue" simply isn't an issue in practice, and I've been spreading FUD 🙁

I'm pretty sure it's possible to create a top-level namespace package using a path hook (I have a proof of concept, but it is very manual, so I won't post it here). That "solves" the problem of how to install, in editable mode, a namespace package. But that's not the way I see people typically discussing namespace packages - they seem to be looking more at something along the lines of installing a project mypkg, where mypkg.plugins is a namespace package that code in mypkg will use. But that makes no sense to me, as it's not possible to create a project that installs something in mypkg.plugins without "owning" the name mypkg (assuming that mypkg is a real package, not a namespace). So how do you distribute a plugin?

So in reality, I don't understand the actual use case for namespace packages in wheels.

@abravalheri I assume you have more experience with people actually wanting to use namespace packages in "real life". Can you explain to me what I've missed here?