pfmoore / editables

MIT License
40 stars 11 forks source link

Define the scope of this project more clearly #24

Open pfmoore opened 1 year ago

pfmoore commented 1 year ago

As discussed in #20, the precise scope of this project - in other words, the question "what is an editable install" needs to be clarified.

At a basic level, this project is intended to implement the machinery to expose a (subset of a) project's files via Python's import system, as if the project is installed, but still allowing the user to edit the files in place, and have the changes visible.

The following restrictions apply:

  1. Files that need any sort of "build step" (such as source code for compiled modules) cannot be exposed in an "editable" form.
  2. The source layout must be suitable. Defining what counts as "suitable" will be an important factor here.
  3. Only capabilities supported by the import system are guaranteed. So, for example, locating data files by reference to __file__ may not be supported (or may only be supported under certain types of editable install).

To set expectations, a pure Python project, created using a src layout (where all of the project code, and nothing else, is stored under the src directory) with everything installed under the name it has in the project source, is cleanly and simply exposed by installing a single .pth file containing the absolute path of the src directory. That is the baseline use case, and should be considered the default and standard approach when using this library (it is exposed via the add_to_path method).

All other approaches must be supported by use cases, which need to address the following questions:

  1. Why doesn't the src layout approach work for you?
  2. How did you handle your project before now?
  3. If the answer to (2) is that you haven't used editable installs before, why must you use them now? And if you're changing your processes to use editable installs, why can't you change your project layout to support existing editable install methods at the same time?
  4. Can the backend support this use case with the existing machinery? There is some leeway here, because every use case can be supported using a directory full of links plus a .pth file. Links aren't necessarily ideal, though, so this question should be viewed as more of a trade-off than an absolute.

Note in particular that "what build backend are you using" is not a question. If you have a valid use case, we can support it. On the other hand, if your problem is "we're switching backends", then that is not a justification - "compatibility with backend X" is not a goal here[^1].

TODO: Write up the use cases supporting the existing methods in the library. [Edit: Done]

Suggestions for extensions to the existing scope should be written up in a similar format to the use cases presented here for the current scope.

[^1]: One of the arguments for the editable install mechanism was to allow backends to choose what capabilities they offered, rather than forcing users to live with whatever their installer provides, as there's more choice of backend than of installer. As a consequence of that, what editable install capabilities are available should be a factor in choosing your backend, not an argument for all backends to provide everything.

LecrisUT commented 1 year ago

So one thing to discuss is supporting importlib.resources. Broadly it sounds to be within the scope given:

this project is intended to implement the machinery to expose a (subset of a) project's files

The issue there being that you cannot navigate between a subset of a project's file to another, e.g. using the iterdir you would only see one subset of files available on the filesystem

pfmoore commented 1 year ago

Is there a protocol that import hooks need to implement to provide the support you describe? Or is it an internal detail of importlib.resources? If the latter, it's going to be a bit like the issue with namespace packages, where the import machinery doesn't provide the necessary means to let 3rd party hooks participate in the mechanism.

LecrisUT commented 1 year ago

It's the former. Basically you need to change the TraversableResource (think of it as the module/directory) so that it changes the type of the generated Taversable (think of it as the file/path) so that you can inject the files from one and the other. The core of it is implementing:

class CustomTraversable(Path):
    def iterdir(self) -> Iterator[Traversable]:
        # Serve the files from the custom paths
        items = self._tree[self._module]
        yield from items
        yield from super().iterdir()

The difficulty is making the hook for the middleparts, i.e. CustomTraversableResource with files() and changing the loaders so that you change their get_resource_reader(). After that spec_from_file_location already provides a functionality to add this hook in by passing the loader parameter. See the implementation insider that function for reference of what needs to be implemented outside of it.

Some nuances there are how to avoid doule-counting with files/modules in source and installed, and getting the metadata to be generated correctly, like __path__.

pfmoore commented 1 year ago

Thanks. Let's not discuss implementation any further at this point. But what's the use case? Specifically, can you provide an example of when using a src layout with a .pth file isn't sufficient, and yet you still need importlib.resources support? How do you handle this currently, and why is that approach no longer sufficient?

LecrisUT commented 1 year ago

You mention that the scope is:

this project is intended to implement the machinery to expose a (subset of a) project's files

That means that depending on the module you call importlib.resources.files(...), you can either have the files in your path or not, depending on which subset of modules you are in. If you use .pth directly, then there is no different subsets.

pfmoore commented 1 year ago

I'm sorry I don't follow what you are saying. Can you give a real-world example?

LecrisUT commented 1 year ago

Similar to the example I gave in #20, but in this case:

import importlib.resources

rootA = importlib.resources.files("my_pkg.moduleA")
for file in rootA.iterdir():
  print(f"{file.name}")

rootB = importlib.resources.files("my_pkg.moduleB")
for file in rootB.iterdir():
  print(f"{file.name}")

If moduleA is selected to be a subset that is being "editable installed" and moduleB is not, then theses will see different files. I.e., rootA will see the files in src and rootB will see the files in venv.

pfmoore commented 1 year ago

When I say a real-world example, I mean an actual project that relies on this, not an artificial example. I understand the problem you're describing, I want to know who is actually hitting this problem, and as a result understand why it matters.

In case it's not obvious, I do not consider it a requirement that an editable install behave identically to a normal install. That's fundamentally not achievable (e.g., compiled code, static analysis). So what we're trying to establish here is what the real-world impact is of any inconsistencies, and for that we need real-world examples of projects that encounter those inconsistencies, in actual use, not in theoretical examples.

pfmoore commented 1 year ago

Use Case

Description

Project includes a directory (typically src) that contains files and directories that in a non-editable install are all installed directly into the target site-packages.

In scope?

Yes.

Solution

project.add_to_path(source_location)

Limitations

Discussion

As the src directory layout is the recommended approach for projects, this use case should cover the majority of new projects with no unusual requirements. For additional discussion on src vs flat layouts, see here - in particular note the comments about the src layout being suitable for editable installs.

pfmoore commented 1 year ago

Use Case

Description

Project uses the "flat" directory layout, with packages and modules to be installed stored directly in the project root directory.

In scope?

Yes.

Solution

project.map(target_import_name, source_file_or_dir)

Limitations

Discussion

The flat layout is generally not recommended for new projects, because it mixes installable code and project workflow code in the same directory. The map mechanism allows explicit specification of what Python files to expose, but it does not try to provide a complete solution for providing an exact match to a normal install.

Backends may prefer to provide more sophisticated approaches that better support the flat layout (for example, building a local "symbolic link farm" matching the intended install layout, and using project.add_to_path to expose that) but that is something for the backend to decide, and is considered out of scope[^1] for this project.

[^1]: In particular, managing the creation of directories in the build location is the responsibility of the backend, just as with any other build artefact.

pfmoore commented 1 year ago

Use Case

Description

The project includes "script wrappers" defined in its metadata.

In scope?

No

Discussion

Script wrappers are the responsibility of the build backend to create, and depend on the project metadata. As such, they are not typically installed in an "editable" manner, and require a reinstall if the relevant project metadata changes.

When performing an editable install, backends may choose to develop their own code to create a more sophisticated script wrapper that dynamically reads the project metadata and can reflect metadata changes without needing a reinstall. But this is not required, and in any case would not correctly reflect addition or removal of script wrapper names.

pfmoore commented 1 year ago

Use Case

Description

The project metadata is changed.

In scope?

No.

Discussion

Project metadata changes should always require a reinstall, as they can fundamentally affect the nature of the project (e.g., name, dependencies). And conversely, metadata changes that don't impact the nature of the project like that, probably aren't relevant to the runtime behaviour of the project.

pfmoore commented 1 year ago

Use Case

Description

The project contains "compiled" native code, such as C or Rust components.

In scope?

No.

Discussion

When native source code is changed, a recompile of the associated binaries is required. This is handled by the build backend, so a new build step is needed. As a build step is being executed, it is no additional overhead to do a reinstall, which can install the compiled code into the target location directly, needing no "editable install" redirection to work.

More sophisticated scenarios do exist. For example, backends could provide some form of "source monitoring" to automatically trigger rebuilds when source code changes. Such features are highly backend-specific, and are not expected to be supported by this library. Similarly, "live debugging" of binaries might require them to be stored in specific locations, which means that installing into site-packages is a problem. Again, backends can provide custom support for this (for example, installing generated Python wrappers that load the binary from the backend-managed location and re-export the appropriate symbols).

For some cases, the project.map method might be useful for backends wanting to support editable binary installs. And further support routines are possible as future feature requests, but these would need to be justified by specific use cases, not just under this generic heading.

pfmoore commented 1 year ago

Use Case

Description

The project contains a subpackage which is physically located outside of the package structure in the source tree.

In scope?

Yes.

Discussion

This is the only way in which assembling an installed package from parts stored separately in the source tree is supported. Suppose your project creates a package foo, and the source code for foo is stored at src/foo. But suppose you have some supporting code which (for some reason) is not stored under src/foo. This supporting code is a series of packages stored in a lib directory, and you want these packages to be visible to the foo package as subpackages of a single foo.support subpackage.

In this situation, exposing the contents of lib under the package name foo.support is supported. There is no support for having initialisation code run when foo.support is imported - it is purely a container for the lib directory (in other words, you cannot have either a src/foo/support/__init__.py file, or a lib/__init__.py file).

This feature was added in editables 0.4.

pfmoore commented 1 year ago

Use Case

Description

Assembling a package directory from separate parts in the source location (with the exception of the specific case discussed in the previous comment).

In scope?

No.

Discussion

This library is intended to provide support for build backends wishing to implement editable installs of projects. To that end, it tries to provide implementations of generic mechanisms for exposing source code to the import system, and does not attempt to support every capability of the various build backends. This particularly applies to features that allow source files to be "moved into place" when creating the final distribution - if a build backend wants to provide such capabilities, it is the responsibility of the backend to determine how (or if) editable installation of such a dynamically constructed project is to be supported[^1].

[^1]: One solution is to build a "shadow" copy of what would be installed, using symbolic links to reflect changes to the source in the shadow. That shadow can then be exposed in the editable wheel.

zooba commented 4 months ago

Hi Paul! (Hope you weren't trying to nerd-snipe me into commenting on the scope of this project, because I really don't have the time and so skipped all the messages)