pfmoore / editables

MIT License
39 stars 11 forks source link

Allow mapping to sub-packages #20

Closed ofek closed 1 year ago

ofek commented 2 years ago

This might be impossible.

Essentially, a user wants to ship src/pkg as namespace/plugins/pkg which is fine except for editable installations as doing editable_project.map('namespace.plugins.pkg', 'src/pkg') produces:

...
  File "C:\Users\ofek\AppData\Local\Programs\Python\Python38\lib\site-packages\editables\__init__.py", line 48, in map
    raise EditableException(
editables.EditableException: Cannot map namespace.plugins.pkg as it is not a top-level package
pfmoore commented 2 years ago

As the error says, the map method is only intended to add top-level modules. Even if that restriction were lifted, the import hook doesn't support namespace packages (see the docs, but this is basically a limitation of how namespace packages are implemented by Python).

I'd be willing to accept a PR that added support for this use case, if someone can find a way to make it work, but it's not one I plan on working on myself.

LecrisUT commented 1 year ago

IS this an issue on this project or in general? There's a similar issue in scikit-build-core that we want to address, and I am wondering if there's any pointers for this.

pfmoore commented 1 year ago

The first thing needed is for someone to work out how to implement an import hook that provides the requested functionality. That's a general issue - there aren't many people working on doing clever things with import hooks, so there's not a lot of expertise in this area to draw on. If you manage to address your issue in scikit-build-core, your solution might help here, and vice versa.

Once it's demonstrated that it is possible to implement the functionality, there is a follow-up question for this project as to whether I want to add that capability. That will depend on the complexity and maintainability of the support code - my goal for this project is to provide lightweight support for key use cases, not to provide every possible variation on "editable installation" that people can think of. Other projects may make different choices.

There is also a third question, which is more of a "community consensus" question than something any individual project can answer, which is what actually is an editable install. The original form of the concept was literally nothing more than a .pth file which added the project source directory to sys.path, and that served everyone fine for years. Features like this one were never even possible under that model. In this project, add_to_path method exposes this mechanism. Anything beyond that is essentially a "new feature request" and (IMO) needs to be justified with use cases and a clear understanding of the costs and benefits. But again, that's only my view as it relates to this project. Other projects and developers may have a different position.

LecrisUT commented 1 year ago

I believe this is quite valuable to have in order to modernize projects into using importlib.resources e.g. I have a configuration for hatchling:

[tool.hatch]
build.include = [
    "python/my_pkg",
    "data",
]

[tool.hatch.build.sources]
"python/my_pkg" = "my_pkg"
"data" = "my_pkg/data"

Where in the code I have:

importlib.resources
root = importlib.resources.files("my_pkg.data")

The project is not python centric so I cannot impose a file restructuring for that. I imagine other people who would want to add python bindings to a project would be having a similar issue.

If there is any progress on scikit-build-core on this point I will ping you about it. @henryiii has the general gist of how to implement this and I will be looking at this project for some inspiration.

RegisGraptin commented 1 year ago

Hello, It seems that I have a similar need for this Issue. I am currently working on creating a python project template, and I would like to use my current developing package for my testing. To describe it, in my project, I have a src folder containing my source code, and a tests folder containing my test.

I am using the rewriting system of hatch from [tool.hatch.build.sources] to rewrite the src to match my package name, with multiple folder as company.package.tools.

Now comes my issue, I am currently thinking about a system to manage my package for the testing part. Indeed, I want to test my package by importing it as it will be named when deployed. So instead of importing my package as src, I thought about importing it as company.package.tools. I also want to edit my current package as it will be in development phases. So, I try to import my package as editable and execute:

pip install -e .

But, here come this issue. I do not know if this use case could be a motivation to develop a possible solution. Also, I do not know if this issue can be handled by pytest, I need to search more information on that. But, if you have an alternative solution for that, I am interested :)

pfmoore commented 1 year ago

Thanks for the explanations. These seem very much related to the hatch feature of remapping directories (not surprising as the request was raised by the hatch author). Is that feature unique to hatch? If not, how do other build backends handle this issue? And equally, how did you handle it before general editable install support was available, when you could only get editable installs using setuptools (and the .pth file approach)?

My reason for asking is because I want to make sure that any features I add to this library are generally useful, and correspond to features common across backends, rather than having to maintain functionality that's only of use to a single backend.

LecrisUT commented 1 year ago

I think it would help with having less code duplication. I have just tested the equivalent setuptools approach and it works, (maybe except for dynamically generated files)

[tool.setuptools.package-dir]
"my_pkg" = "python/my_pkg"
"my_pkg.data" = "data"

And on the implementation side it points the .pth to a python script which contains all of the generated aliases of the files. For reference:

__editable____my_pkg_1_2_3_finder.py ``` import sys from importlib.machinery import ModuleSpec from importlib.machinery import all_suffixes as module_suffixes from importlib.util import spec_from_file_location from itertools import chain from pathlib import Path MAPPING = {'beakerlib': '/home/lecris/PycharmProjects/beakerlib/python/beakerlib', 'beakerlib.src': '/home/lecris/PycharmProjects/beakerlib/src'} NAMESPACES = {'beakerlib.src.extradocs': ['/home/lecris/PycharmProjects/beakerlib/src/extradocs'], 'beakerlib.src.perl': ['/home/lecris/PycharmProjects/beakerlib/src/perl'], 'beakerlib.src.test': ['/home/lecris/PycharmProjects/beakerlib/src/test'], 'beakerlib.src.vim': ['/home/lecris/PycharmProjects/beakerlib/src/vim'], 'beakerlib.src.xslt-templates': ['/home/lecris/PycharmProjects/beakerlib/src/xslt-templates'], 'beakerlib.src.yash': ['/home/lecris/PycharmProjects/beakerlib/src/yash'], 'beakerlib.src.docs': ['/home/lecris/PycharmProjects/beakerlib/src/docs'], 'beakerlib.src.extradocs.examples': ['/home/lecris/PycharmProjects/beakerlib/src/extradocs/examples'], 'beakerlib.src.extradocs.examples.apache': ['/home/lecris/PycharmProjects/beakerlib/src/extradocs/examples/apache'], 'beakerlib.src.extradocs.examples.phases': ['/home/lecris/PycharmProjects/beakerlib/src/extradocs/examples/phases'], 'beakerlib.src.extradocs.examples.simple': ['/home/lecris/PycharmProjects/beakerlib/src/extradocs/examples/simple'], 'beakerlib.src.vim.ftdetect': ['/home/lecris/PycharmProjects/beakerlib/src/vim/ftdetect'], 'beakerlib.src.vim.syntax': ['/home/lecris/PycharmProjects/beakerlib/src/vim/syntax']} PATH_PLACEHOLDER = '__editable__.beakerlib-0.0.0.finder' + ".__path_hook__" class _EditableFinder: # MetaPathFinder @classmethod def find_spec(cls, fullname, path=None, target=None): for pkg, pkg_path in reversed(list(MAPPING.items())): if fullname == pkg or fullname.startswith(f"{pkg}."): rest = fullname.replace(pkg, "", 1).strip(".").split(".") return cls._find_spec(fullname, Path(pkg_path, *rest)) return None @classmethod def _find_spec(cls, fullname, candidate_path): init = candidate_path / "__init__.py" candidates = (candidate_path.with_suffix(x) for x in module_suffixes()) for candidate in chain([init], candidates): if candidate.exists(): return spec_from_file_location(fullname, candidate) class _EditableNamespaceFinder: # PathEntryFinder @classmethod def _path_hook(cls, path): if path == PATH_PLACEHOLDER: return cls raise ImportError @classmethod def _paths(cls, fullname): # Ensure __path__ is not empty for the spec to be considered a namespace. return NAMESPACES[fullname] or MAPPING.get(fullname) or [PATH_PLACEHOLDER] @classmethod def find_spec(cls, fullname, target=None): if fullname in NAMESPACES: spec = ModuleSpec(fullname, None, is_package=True) spec.submodule_search_locations = cls._paths(fullname) return spec return None @classmethod def find_module(cls, fullname): return None def install(): if not any(finder == _EditableFinder for finder in sys.meta_path): sys.meta_path.append(_EditableFinder) if not NAMESPACES: return if not any(hook == _EditableNamespaceFinder._path_hook for hook in sys.path_hooks): # PathEntryFinder is needed to create NamespaceSpec without private APIS sys.path_hooks.append(_EditableNamespaceFinder._path_hook) if PATH_PLACEHOLDER not in sys.path: sys.path.append(PATH_PLACEHOLDER) # Used just to trigger the path hook ```

Of course that is specific to setuptools and we would need to either re-implement this feature on each build system or have a common interface for it, which this project would make sense to live in.

As for being used by multiple backends, scikit-build-core also needs this feature, not just hatchling.

pfmoore commented 1 year ago

OK, thanks. So setuptools implemented this independently. I agree it would be useful to have it centralised, but setuptools chose not to use this library, so we'll end up with independent implementations. I'm not entirely comfortable having to monitor the setuptools implementation and replicate any fixes they make as an ongoing exercise. But we can worry about that later, for now let's just work on seeing if this can be added here. I will note, though, that I'm not interested in this library competing with setuptools, so "working like setuptools" is explicitly not a requirement here[^1].

If someone wants to make a PR implementing this, I will consider it. But some points I should make up front:

  1. It should be opt-in, like the existing mechanisms[^2]. This project offers capabilities for backends to use, I don't want to set any sort of policy on how a backend implements editable installs (hence the existence of the .pth approach and the existing import hook).
  2. It must be cross-platform and work for all Python versions this library supports (I imagine that's obvious).
  3. It should include documentation, similar to the existing docs, that explains the mechanism, why you might want to use it (from the POV of a backend author and an end user) and what the trade-offs are[^3]. It doesn't have to be great art - I'm bad at documentation, as you can see if you read the current docs, and "no worse than I did" is the bar I'm setting here 😉

I'd also like it to have tests. The existing tests aren't great, but the more complexity gets added, the more important good test coverage will be.

[^1]: It's arguable that's what the add_to_path method did, as that implemented the old setuptools approach. But setuptools changed their approach when they implemented PEP 660, which is fine, but I don't plan on trying to track their changes. [^2]: That is, for backends. It's up to the backend whether they want to let the user choose what mechanism to use. [^3]: In particular, what doesn't work.

pfmoore commented 1 year ago

Thinking some more about this, I think the important point here is to define what people expect to get from an editable install in the first place. Otherwise the scope of this project is at best ill-defined, and I will get continually asked to decide whether features are in scope. I'll open up a new issue for discussion on this question, but for now the basic principles I'm working from are as follows:

  1. The basic "editable install" functionality was defined by the old setuptools develop mechanism, which simply put the project directory on sys.path using a .pth file. So that should be seen as the baseline functionality here.
  2. Not all project layouts and workflows need to be supported. There's no reasonable argument that says that it's necessary to be able to do an "editable install" of a package that has code files split across 3 different project directories, with code files for other packages that should not be installed mixed in with the files we want.

Also, this project will only ever provide a mechanism. Backends can (and may need to) supplement that mechanism with implementation details of their own (for example, building a link farm if they want to use that approach). There's certainly an argument for a common library that implements full editable installs, but this isn't that library - I'm only interested in the import hook side of the problem.

LecrisUT commented 1 year ago

Indeed I understand the motivation to keep it simple and self-contained, it's just that it would be nice to have a common project so we don't have to re-invent the wheel. Recently I have tried to implement the approach from mesonpy and I find that there are quite a lot of nuances that one has to tackle in order to make an editable install be functional (beyond piggybacking onto spec_from_file_location), and it would be nice to have these nuances figured out in a common place.

Not all project layouts and workflows need to be supported.

Indeed, for purely python based projects these are unnecessary. But at the same time, wouldn't the built-in implementation also be sufficient? I am a bit confused as to what this project does on top of what's already available using .pth redirection if it's not a more fine-grained control of the redirections, which subpackages seem to be part of that.

But indeed this feature is quire complex because you have to deal with the differences of module, package, extension and namespace, which is non-trivial and sparsely documented, and even the implementations out there seem to fail in edge-cases. I have something half-functional in https://github.com/scikit-build/scikit-build-core/pull/399, which should be equivalent to mesonpy's approach, but I can't figure out the edge-cases of it.

pfmoore commented 1 year ago

it's just that it would be nice to have a common project so we don't have to re-invent the wheel

Agreed. As long as it's a wheel that is needed 🙂

But at the same time, wouldn't the built-in implementation also be sufficient? I am a bit confused as to what this project does on top of what's already available using .pth redirection if it's not a more fine-grained control of the redirections, which subpackages seem to be part of that.

IMO, yes, the .pth approach is almost always sufficient, especially now that the src layout is the recommended approach for project structure. This project grew out of the demand for a prototype implementation of an "import hook" based approach that avoided the problem of import setup picking up setup.py in flat project layouts. The alternative approach had limitations of its own that I view as acceptable trade-offs for something that isn't needed for a src layout.

As I said, I'm happy to look at extending the import hook machinery, but I want to keep it focused on genuine use cases, and part of that is insisting that proposals clarify precisely why the "src layout plus .pth file" approach is insufficient.

You've made the case here for needing a solution when you want to map what's in the src directory to a subpackage of a namespace in the target interpreter (if I've understood your use case correctly), so as I said, a PR for that would be welcome. But that's a specific feature based on an actual use case, not a general principle that "the import hook method should be the norm". And in fact, it's possible that remapping might not even need an import hook. You should consider whether installing the namespace with a __path__ attribute that points to the src directory might work just as well, with no import hook needed (I've not tried it, so that's just speculation - if I get some spare time, I might do some experiments).

As a side note, one significant disadvantage of import hooks is that because they work at runtime, they do not interact at all well with static type checkers and tools like IDE tooltips. This is a known issue, and it's been raised with the typing community, who don't have a solution other than standardising some form of (as yet unspecified) static representation of the mapping the hook implements.

pfmoore commented 1 year ago

Essentially, a user wants to ship src/pkg as namespace/plugins/pkg which is fine except for editable installations

I just did some testing. You can do this by installing a single namespace/plugins/__init__.py file containing the single line

__path__ = ["/path/to/directory/src"]

I've lightly tested this and it seems to work. I could easily add this to editables as a new function. It would not require an import hook to function, just a single file added to the list that need including in the editable wheel.

LecrisUT commented 1 year ago

Try to run something like:

import importlib.resources

root = importlib.resources.files("my_pkg")
resource_root = root.join_path("resources")
for file in resource_root.iterdir():
  print(f"{file.name}")

(Assuming the structure is src/my_pkg => my_pkg, resources => my_pkg.resources)

In this context resources is something shared between python and other system, e.g. systemd files, html files, data files, etc.

pfmoore commented 1 year ago

Isn't that a bug in importlib.resources? After all, __path__ is part of the core import machinery, so if importlib.resources doesn't support it, that's their problem, surely?

LecrisUT commented 1 year ago

Well, no because path is used for the module, which is all fine, but after you call files() you get a Path object which can only know the information of the local system path. That's the join_path which on a Path simply adds to the path, while in this case it need to be provided the information that there are other effective directories symlinked there.

pfmoore commented 1 year ago

Hang on. You want to map a directory containing no Python code to a subdirectory of a package? Why can't you put the resources directory under src/my_pkg in the first place? I don't plan on supporting arbitrary remappings like this - the scope here is exposing python modules and packages in the source as importable modules/packages in the target interpreter. Arbitrarily restructuring the directory layout of the source project is way beyond that. If a backend wants to provide that sort of functionality, it can implement it independently, of course, but it's not something I want to support in this library.

For the original request here, shipping src/pkg as namespace/plugins/pkg, the __path__ solution seems fine.

LecrisUT commented 1 year ago

The resources is part of a broader project's files, e.g. a C library where the Python bindings are an optional support. The init.py can still be there in resources thus making it an effective package. The thing is though you cannot navigate through the packages via join_path after you've called files(). You can import each package and navigate within them, but not across them.

ofek commented 1 year ago

It might help to show example directory trees and their locations to illustrate what you're trying to do

pfmoore commented 1 year ago

In fact, @LecrisUT please provide a standalone example of a project that demonstrates what you want. If you provide a full, buildable example that installs (using a non-editable install) the way you want it to and which shows the issue that you want supported by editable installs, I'll look at whether this is something we can handle.

@ofek I may be mistaken, but I think that the use case @LecrisUT is describing is a little different from the feature you asked for originally on this issue. With that in mind, would the __path__ solution I proposed above be a suitable addition to satisfy your original feature request?

pfmoore commented 1 year ago

For the resources directory case, another option is that when doing an editable install, the backend could write a symlink into the package directory pointing to resources. I won't do that in this project because:

  1. Managing the project directory is the backend's responsibility, and this project is careful not to write files, just to provide information on what files to add to the editable wheel.
  2. I won't handle symlinks in this project, as I don't want to deal with the cross-platform support implications. Backends can certainly use symlinks if they want to, though.
ofek commented 1 year ago

Yes, that would work for me!

pfmoore commented 1 year ago

One thing to note, when I come to implement this. If you want to graft directory X into the import system as package foo.bar, it should not contain an __init__.py itself (the __init__.py for foo.bar is supplied by the editable project implementation, and just sets __path__).

So the directory X can contain modules (core.py, gui.py etc.) and packages (subdirectories with their own __init__.py), but cannot have an __init__.py of its own. Essentially, X needs to be a pure namespace package.

Is that sufficient for you, or is that limitation (which I will document) going to be a problem? I'll probably implement it anyway, as it's simple enough and works without an import hook, which is beneficial, but I don't want to misrepresent it if it won't actually address your use case.

pfmoore commented 1 year ago

Thinking about this:

Essentially, a user wants to ship src/pkg as namespace/plugins/pkg

It should be fine if:

  1. src has no __init__.py (which it shouldn't)
  2. There's nothing but pkg in src (or you're OK with anything else that is there being visible under namespace.plugins as well)

In that case, project.add_to_subpackage("namespace.plugins", "src") will do the job (bikeshedding over the method name is OK, if you can think of something better 😉)

ofek commented 1 year ago

That example you gave, what would that particular call return?

pfmoore commented 1 year ago

('namespace/plugins/__init__.py', '__path__ = ["/absolute/path/to/src"]')

Technically, of course, add_to_subpackage doesn't return anything, but it adds that tuple to what gets returned by files().

ofek commented 1 year ago

Yeah that would work in my case and I think would be simple to start using

pfmoore commented 1 year ago

Added in #28

I will do a new release in the next day or so. If you have a chance to test the new implementation before then, that would be great (but no pressure, I'm not going to hold up the release if you're busy).

pfmoore commented 1 year ago

Assuming the structure is src/my_pkg => my_pkg, resources => my_pkg.resources

I have updated the documentation to be explicit that in the call project.map(pkg_name, directory), pkg_name must be a top-level (undotted) name, and directory must be a Python package (or .py file). The code already raises an exception if either of these two constraints aren't met, so it's simply a matter of being explicit in the docs.

I've also noted that the files specified by the library must be installed unchanged, and must not clash with any files that the caller writes in addition. This should cover the point that project.add_to_subpackage("namespace.plugins", "src") cannot be used if the wheel already contains a namespace/plugins/__init__.py file (which is something I don't support, but can't enforce).

The resources mapping in the quoted layout fails on three counts:

  1. The target isn't a top-level name (a problem if map is used)
  2. The resources directory isn't a Python package (I'm assuming) (a problem if map is used)
  3. my_pkg has an __init__.py file from the first mapping (a problem if add_to_subpackage is used)

I don't have any immediate plans to support that use case. The user can add a symbolic link from src/mypkg/resources to resources, or hatch can do this as part of setting up the editable install, but I consider it as out of scope here.

pfmoore commented 1 year ago

Closing as #28 is now merged.

pfmoore commented 1 year ago

Release 0.4 has now been uploaded with this included.