pfmoore / editables

MIT License
38 stars 12 forks source link

Add support for namespace packages #34

Open jaraco opened 1 month ago

jaraco commented 1 month ago

In coherent-oss/coherent.build#2, I'm working on adding editable support for Coherent System projects that follow the essential layout. And while I recognize this project primarily aims to support the src layout, it would be nice to re-use the redirector functionality for projects using the essential layout.

At first blush, it doesn't appear to me as if the editables project can support (PEP 420) namespace packages in any layout. That is, if a project like jaraco.functools or coherent.build doesn't have any __init__.py in the root namespace, it is incompatible with the redirector (because the target of a redirection needs a file location and there's no file location for a namespace package).

Moreover, any redirector should probably support redirecting the inner module/package and not the namespace package so that the editable-installed package can exist alongside other packages sharing the same namespace (e.g. jaraco.functools editable-installed alongside jaraco.context installed on the file system).

I believe it should be possible for the redirector to support redirecting namespace submodules. In particular, I'm imagining something like:

F.install()
F.map_module("jaraco.functools", "path/to/jaraco.functools/__init__.py")

The redirector would then expose the parent jaraco module (if it doesn't already exist) and the jaraco.functools module (from the indicated path).

Maybe we'll want to add something to #24, but I wanted to start with a dedicated issue to discuss the possibilities and concerns.

Some questions:

Is this use-case something that was considered and rejected?

Do you have any specific goals that would prevent adding support to the Redirector (with or without support in EditableProject)?

Is there another construct/approach that's meant to address this need?

jaraco commented 1 month ago

In #35 and #36, I've drafted a couple of possible approaches.

pfmoore commented 1 month ago

I don't particularly understand the logic behind the "essential layout" - there's no pyproject.toml shown in the layouts described, so I don't see the full picture here. But I think that's mostly irrelevant except as a motivating example.

The biggest issue with supporting (implicit) namespace packages is that they are documented as responding dynamically to changes to the filesystem content, in a way that import hooks cannot implement without replacing the whole filesystem loader from importlib (there's undocumented support code in importlib that provides essential functionality).

While it's possible to argue that omitting that dynamic behaviour is "good enough", I'm wary of getting sucked into such arguments - for many years, the .pth file mechanism was "good enough", but now people say it's not, while not being able to precisely define what their new boundaries for "good enough" behaviour are[^1].

Answering your specific questions:

Is this use-case something that was considered and rejected?

Yes, because of the problem implementing the dynamic behaviour of implicit namespaces, as discussed above.

Do you have any specific goals that would prevent adding support to the Redirector (with or without support in EditableProject)?

The main one is "keep things simple". This project is not intended to be the last word in editable install support, it's intended as a simple, fairly straightforward implementation that can be easily used by build backends with a basic need for editable support.

A secondary answer is that I view the import hook approach as a "last resort", and strongly recommend using add_to_path and add_to_subpackage wherever possible. The map method is very much "use at your own risk".

Is there another construct/approach that's meant to address this need?

The documentation has a whole section on supported use cases. I believe the section on "Project directory installed under an explicit package name" covers what you're talking about. If not, then I'm afraid I don't understand the specifics of your use case well enough.

I'm not flat-out rejecting this idea, but I'll need to understand why you need this specific solution before I'm willing to go down this route. In particular, I consider import hooks to be very much an "advanced" implementation technique, which should almost never be needed in practice - the stdlib approaches for manipulating sys.path are (in my view) more than adequate for most realistic needs (assuming that people are looking for a practical solution, rather than some form of theoretical purity - which sadly doesn't always seem to be the case).

[^1]: Yes, I am bitter about that 🙁

jaraco commented 4 weeks ago

The documentation has a whole section on supported use cases. I believe the section on "Project directory installed under an explicit package name" covers what you're talking about. If not, then I'm afraid I don't understand the specifics of your use case well enough.

I hadn't read the docs, but I read the implementation, and thus had already determined that the technique is non-viable for implicit namespace packages because of how it creates a __init__.py (thereby breaking implicit namespace packages). If that namespace exists elsewhere on sys.path, adding an __init__.py to one location will break discovery of the other location. To illustrate, consider the ns.pkgA and ns.pkgB packages in the implicit namespace ns:

 draft @ mkdir -p path1/ns/pkgA
 draft @ mkdir -p path2/ns/pkgB
 draft @ touch path1/ns/pkgA/__init__.py
 draft @ touch path2/ns/pkgB/__init__.py
 draft @ env PYTHONPATH=path1:path2 py -c "import ns.pkgA; import ns.pkgB" && echo 'works!'
works!

They're both present on different sys.path entries (as that's a key part of how these namespace packages work).

But now you want to editable-install pkgA using the add_to_subpackage technique:

 draft @ mkdir -p project/pkgA
 draft @ touch project/pkgA/__init__.py
 draft @ rm -r path1/ns/pkgA
 draft @ cat >> path1/ns/__init__.py
__path__.append('project')
 draft @ env PYTHONPATH=path1:path2 py -c "import ns.pkgA; import ns.pkgB" && echo 'works!'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'ns.pkgB'

Now, although ns.pkgA still imports fine, ns.pkgB is missing. When the PathFinder found path1/ns/__init__.py, it created a simple package and stopped searching the path for more packages in the namespace. Editable-installing pkgA caused pkgB to go missing.

jaraco commented 4 weeks ago

The aforementioned example shows a problem with the namespace in two different Python paths, but another problem arises even with everything installed on the same Python path. If ns.pkgA and ns.pkgB are both editable-installed to the same site-packages, only one of their ns/__init__.py files will win and the losing package will be missing.

pfmoore commented 4 weeks ago

No, the __init__.py is added to ns/pkgA, not to ns.

>>> from editables import EditableProject
>>> my_project = EditableProject("foo", "/path/to/foo")
>>> my_project.add_to_subpackage("ns.pkgA", "src/pkgA")
>>> list(my_project.files())
[('foo.pth', ''), ('ns/pkgA/__init__.py', "__path__ = ['C:\\\\path\\\\to\\\\foo\\\\src\\\\pkgA']")]

Your test is testing how you think the method works, rather than how it actually does work...

jaraco commented 4 weeks ago

Your test is testing how you think the method works, rather than how it actually does work...

Wow! Okay. I had considered that possibility, but I was 95% sure that would not have the intended effect, that you could only cause additional submodules and subpackages to be added to a package, but you couldn't replace the package being imported. My suspicion was reinforced by reading the docs that say

For this use case, the project.add_to_subpackage method is available. This method creates the my.namespace package (by installing an __init__.py file for it into site-packages) and gives that package a __path__ attribute pointing to the source directory to be installed under that package name.

The fact that __init__.py is installed for my.namespace made me think that the namespace was getting the __init__.py, not a package (or module?) in that namespace.

Attempting to confirm this approach, it doesn't seem to have the intended effect. Any behavior in src/pkgA/__init__.py is missing.

 draft @ mkdir -p ns/pkgA
 draft @ cat >> ns/pkgA/__init__.py
__path__ = ['/Users/jaraco/draft/src/pkgA']
 draft @ mkdir -p src/pkgA
 draft @ cat > src/pkgA/__init__.py
val = 42
 draft @ py -c "import ns.pkgA; print(ns.pkgA.val)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'ns.pkgA' has no attribute 'val'

I confirmed that I am able to import other submodules of pkgA, just not its root module __init__.py:

 draft [1] @ touch src/pkgA/foo.py
 draft @ py -c "import ns.pkgA.foo" && echo 'works!'
works!

This is so tantalizingly close to what I need. I wonder if the add_to_subpackage shouldn't add another line to the generated __init__.py as (essentially) execfile(__path__[0] + '/__init__.py').

Is this ns.pkgA example something you'd like EditableProject to support, even if that means having to explicitly execute the root module?

pfmoore commented 4 weeks ago

Hmm, that's a good point. And the approach of reading and executing __init__.py in the generated one is a plausible fix.

But what I really need here are some good, small, worked examples of why you want to do this. Not particularly in the context of your larger "essential layout", but just as a straightforward Python project. From my perspective, namespace packages over the years have always felt like they have suffered from a "solution looking for a problem" mindset - the best example of this is the dynamic path recalculation behaviour in the original PEP - the example in the PEP shows a situation where dynamic recalculation is needed and would work, but it doesn't give any real-world example of why you'd want to do this sort of runtime path manipulation. And in my experience, most packaging users never do need this sort of thing - sys.path gets set up by the runtime, and never altered during program execution. So we have a complex, difficult-to-support mechanism, which in practice is never needed.

And in this case, the complexity is non-trivial - execfile doesn't exist any more, so we need to read the file content (handling things like encoding declarations) and then exec it (handling __future__ imports somehow), etc, etc. All of that is a source of potential bugs, and while it's true that we'll only get bug reports if the functionality is useful to people, I'd like to be sure, before we add the code, that there isn't an alternative of saying "why don't you do the following instead, which gives the same result?"

That means understanding why people want to do this in the first place. After all, the PyPA recommended project layout has the project files located in src, and putting src onto sys.path by using a .pth file "just works". So as far as I'm concerned, every use of editable installs that doesn't just use add_to_path is at some level a hack to work around people not following the standard project layout recommendations. So it's important to understand why they aren't following the recommendation, and how far we can go with requiring them to change their project structure to better conform to the existing recommendations.

(Somewhat off-topic rant) I find this pretty frustrating, because the general feeling in the packaging community is that we should be more willing to enforce an opinionated standard, with one "blessed" packaging tool that manages your project for you, like cargo in the Rust community. And yet, that approach implies a standard project structure - which we already have in the src layout - but only supporting that layout feels like a non-starter. I'm almost inclined to start a conversation on the Packaging Discourse, suggesting that if we, as a community, want to go down the route of standardising tools (as opposed to just interoperability protocols) then we should start by declaring the src layout as the official PyPA project layout, and making a statement that tools are only required to support that layout. But it's borderline trolling, and I don't think the community needs another epic "strategy" debate right now...