scikit-build / scikit-build-core

A next generation Python CMake adaptor and Python API for plugins
https://scikit-build-core.readthedocs.io
Apache License 2.0
241 stars 48 forks source link

Idea for new entry point: `cmake.package` #831

Open KyleFromNVIDIA opened 3 months ago

KyleFromNVIDIA commented 3 months ago

Scikit-build-core already supports the cmake.prefix entry point, which appends the value to CMAKE_PREFIX_PATH. In theory, this will allow projects that install lib/cmake/<package>/<package>-config.cmake to automatically be found by consuming projects. However, this does not work in cases where:

  1. The package uses a non-standard convention
  2. The package installs in lib64/cmake/<package>/<package>-config.cmake, and the consuming package is built on a Debian-based system, because in this case, FIND_LIBRARY_USE_LIB64_PATHS is set to FALSE

In theory, the exporting project could set cmake.prefix to <wheel-name>/lib64/cmake/<package>, because the find_package() search procedure does look directly in the values of CMAKE_PREFIX_PATH in addition to the various subdirectories listed. However, that is only conventional on Windows, and I don't consider it good practice to set CMAKE_PREFIX_PATH to <wheel-name>/lib64/cmake/<package>, when prefixes are really meant to have lib64/cmake/<package> within them.

I propose an alternative: a new entry point called cmake.package, that is used like so in entry_points.txt:

[cmake.package]
ExamplePackage = "example/lib64/cmake/example"

In this case, scikit-build-core would pass -DExamplePackage_DIR=/home/user/.local/lib/python3.12/site-packages/example/lib64/cmake/example to CMake. The *_DIR variable tells CMake exactly where to find a package file, rather than giving it a list of prefixes to search through their subdirectories. There's no ambiguity, and it works with any package layout, as long as the config file is named <Package>Config.cmake or <package>-config.cmake.

But it gets even better! With the CMake File API, we can even automatically figure out if the project is exporting any package config files. By searching through the installer keys, it can search for file installers that install <package>-config.cmake or <Package>Config.cmake, and export installers that install <package>-targets.cmake or <Package>Targets.cmake. If it finds the two together, it's a pretty safe bet that that's a CMake package config file, and scikit-build-core can automatically create an entry point for it. (Note that this would be more ambiguous if the <package>-config.cmake convention is used, since that loses casing information. In that case, we could defer to the export name, or just not support the <package>-config.cmake convention.)

Note that if the heuristics used by the automatic entry point creation are incorrect for any reason, the package creator can still manually specify a cmake.package entry point to override the automatic creation procedure.

LecrisUT commented 3 months ago

Great idea!

To flesh out this a bit more, how should the project define the CMakePackageConfigHelpers calls to support standard and wheel install paths, e.g. INTALL_DESTINATION. Probably it's more an issue of convention. There are <pkg>.libs folder structure that are used to store bundled dependency libraries so probably something similar is in order, something to store stuff under site_packages.

There is also auditwheel that can potentially move (or rather copy) dependencies into <pkg>.libs folder. Any idea on how to integrate with that, e.g. how to avoid doubling the libraries is another scikit-build-core package also provides it?

On Fedora packaging side, we also have to figure out how to point the entry_point there to the system installed one.

On python side, it would have to look slightly different because afaik, the entry_point must point to a python loadable path, but I don't see a reason why that path can't be made to be a valid python path with scikit-build-core generated metadata to point to store the actual paths.

KyleFromNVIDIA commented 3 months ago

After doing some research on entry points, I understand them a little better, and have a better idea of how we would go about this.

The entry point would look more like:

[cmake.package]
ExamplePackage = "example:cmake_package_dir"

And then in example/__init__.py:

import importlib.resources as rs

def cmake_package_dir(_package_name):  # _package_name is ignored here, but could theoretically be used
    return rs.files(__package__) / "lib64/cmake/example"

On python side, it would have to look slightly different because afaik, the entry_point must point to a python loadable path, but I don't see a reason why that path can't be made to be a valid python path with scikit-build-core generated metadata to point to store the actual paths.

Yes, that's basically what I'm advocating here.

KyleFromNVIDIA commented 3 months ago

There is also auditwheel that can potentially move (or rather copy) dependencies into <pkg>.libs folder. Any idea on how to integrate with that, e.g. how to avoid doubling the libraries is another scikit-build-core package also provides it?

I believe auditwheel only copies the runtime dependencies (*.so files), not CMake package config files, so it shouldn't be an issue.

henryiii commented 3 months ago

I believe we had this idea at the beginning, got stuck on if it should set _DIR or _ROOT, and then somehow didn't end up adding either...

IIRC _ROOT is the better one?

KyleFromNVIDIA commented 3 months ago

No. *_ROOT specifies a prefix to search through various subdirectories (lib, lib64, lib/cmake, etc.) while *_DIR specifies exactly where to find the file. We know exactly where the file is at installation time, so there's no reason not to record it and remove all of the ambiguity.

henryiii commented 3 months ago

We've been supporting directly specifying the directory and not requiring an entry point function. It just needs to be a valid Python module, which is pretty much just a directory.

henryiii commented 3 months ago

Ahh, but a module path could resolve to several physical paths. But I think we could just then pick the one with the config file in it?

KyleFromNVIDIA commented 3 months ago

We've been supporting directly specifying the directory and not requiring an entry point function.

How? How does the consuming project find the directly-specified directory?

LecrisUT commented 3 months ago

Ahh, but a module path could resolve to several physical paths. But I think we could just then pick the one with the config file in it?

Shouldn't be a problem. The idea is that scikit-build-core generates a vanilla python module that stores the path to the Config.cmake file. Similar to the editable install approach. Then scikit-build-core loads the function/attribute to get the path that it should populatein the _ROOT/_DIR variable.

I.e. not using importlib.resources past the entry_point, but standard pathlib.Path relative to the generated file. One key point would be to have a dedicated/predictable path like <pkg>.cmake at the site_packages top level so that relative navigation can be done reliably

LecrisUT commented 3 months ago

We've been supporting directly specifying the directory and not requiring an entry point function.

How? How does the consuming project find the directly-specified directory?

I guess you can always define cache variables manually, but not sure if it refers to that. And also this seems to be the reverse, i.e. the installed package defines what and where Config.cmake it provides, instead of the dependent having to define where to search. I don't believe there is a functionality for the former

KyleFromNVIDIA commented 3 months ago

I don't believe there is a functionality for the former

Right. I'm proposing adding functionality for this:

the installed package defines what and where Config.cmake it provides

so that consuming packages don't have to know things about the dependency's wheel layout. We were almost there with cmake.prefix, but it wasn't able to handle non-standard installation layouts and layouts that use lib64 when the host system is Debian-based.

LecrisUT commented 3 months ago

Hmm, also occurs to me, how would one disable the preference of the entry_points? For CMAKE_PREFIX_PATH it is somewhat doable if we make sure to append to the existing value, but _ROOT and _DIR have top-most priority so probably those should not be used.

The longterm proper solution would probably be to write a DependencyProvider interface like vcpkg apparently already does? But then how would we enable multiple DependencyProvider or find them?

vyasr commented 3 months ago

Kyle and I were discussing this earlier and I am supportive of the idea.

We've been supporting directly specifying the directory and not requiring an entry point function.

How? How does the consuming project find the directly-specified directory?

I think Henry was referring to the fact that with the cmake.prefix entrypoint projects just specify a module and SKBC loads that module, rather than specifying an object within the module to call.

The idea is that scikit-build-core generates a vanilla python module that stores the path to the Config.cmake file.

Could you actually stick to the above and simply specify the relative path to the directory? Directories are (namespace) packages, therefore importable modules, so I think you could just take the resulting module's __file__ (ep.load().__file__) as the <package>_DIR I believe.

Hmm, also occurs to me, how would one disable the preference of the entry_points? For CMAKE_PREFIX_PATH it is somewhat doable if we make sure to append to the existing value, but _ROOT and _DIR have top-most priority so probably those should not be used.

Are you imagining a situation where there are multiple providers of a config file with the same name? I don't know if there is a good way to avoid them clobbering each other in this case. _ROOT would sort of work for this because CMake allows it to be a list of directories instead of a single one, but we don't want to use that for the reasons Kyle enumerated above, and also that would introduce an arbitrarily ordering. Conversely _DIR is a single location, so there is no way to preserve multiple. I'm not sure that this is a situation that we should worry about, though. If you were to install these packages into your sysroot they would clobber each other anyway, and similarly if you were to add multiple conflicting configs to the CMAKE_PREFIX_PATH.

henryiii commented 3 months ago

I think Henry was referring to the fact that with the cmake.prefix entrypoint projects just specify a module

Yes, also cmake.module. These work by simply providing the module that contains the CMake files, and the tool using them (scikit-build-core) can get the directory(s) from that. As long as it's a file-based loader. Other loaders could be handled eventually, by realizing the directory, but it's probably not needed realistically and could be handled eventually (at least for newer Pythons).

Anyway, we should not be discussing changing the current system in this issue - the new entry point should be just like the existing pair (minus the extra detail that the name matters).[^1]

[^1]: By the way, entry points only work if the package is installed into site-packages; we don't support packages installing into data/headers, etc. If we wanted to try to support this, we could do it this time. But that means users would need custom functions that located the cmake module, etc.

Are you imagining a situation

I think this was about how to disable an installed Python module forcing its CMake files to be found. If you had "thing" installed with thing = "thing.resources.cmake", for example, how would you instead get it to select a non-Python installed "thing"? We could potentially not set thing_DIR if thing_DIR was defined, but that would force someone to define it.

When we were first discussing this (can't find records, probably was around community meeting 5 or so), I think we had the idea that users could supply ignore patterns on the entry point names.

By the way, going back to the beginning:

But it gets even better! With the CMake File API, we can even automatically figure out

We can't add an entry point automatically unless a user sets this in their dynamic array, that's a PEP 621 rule. Which means they know about it and probably could just actually set the entry point; explicit is better than implicit. And if we needed a custom function, this is out, we can't automatically add Python functions to a package!

Also:

when prefixes are really meant to have

I don't think that's required, though. Setting your CMAKE_PREFIX_PATH directly to the location you want to search is supported. AFAICT this is just trying to make something that already works a bit "nicer" conceptually?

LecrisUT commented 3 months ago

Could you actually stick to the above and simply specify the relative path to the directory? Directories are (namespace) packages, therefore importable modules, so I think you could just take the resulting module's __file__ (ep.load().__file__) as the <package>_DIR I believe.

The issue I see is that it wouldn't be able to support pointing to system packages like /usr/lib64/cmake because all of the path would have to be converted to python package, which obviously cannot be the case for /usr/.... Distro packaging would be affected by this, but also spack, conda, etc. In principal we could patch out that entry point, but it would be preferred to keep things consistent, who knows how other tools will start assuming these.

Are you imagining a situation

I think this was about how to disable an installed Python module forcing its CMake files to be found.

Indeed, basically going by the general CMake guidelines to not define (top-level) user configurations inside a CMakeLists.txt. In a similar manner we should keep those empty such that the end-user can define them inside tools.scikit-build.cmake.define or as --config-settings. There could be checks if those variables are defined, but I think it would make it too complicated for the user to follow. CMAKE_PREFIX_PATH on the other hand is standardized and used everywhere. (Btw, how would the cache variable interact with the env-variable?)

robertmaynard commented 3 months ago

I believe we had this idea at the beginning, got stuck on if it should set _DIR or _ROOT, and then somehow didn't end up adding either...

IIRC _ROOT is the better one?

_ROOT has extended behavior where it is propagated to any nested find_dependency or find_package call from the parent. That allows for a project that ship third party dependencies to find the packaged version.

LecrisUT commented 3 months ago

_ROOT has extended behavior where it is propagated to any nested find_dependency or find_package call from the parent. That allows for a project that ship third party dependencies to find the packaged version.

Interesting, can you expand on that with an example or upstream documentation? From my reading it means that if you are in find_package(PkgA) that used PkgA_ROOT, and you called a find_dependency(PkgB) it will search in PkgA_ROOT as well?

robertmaynard commented 3 months ago

_ROOT has extended behavior where it is propagated to any nested find_dependency or find_package call from the parent. That allows for a project that ship third party dependencies to find the packaged version.

Interesting, can you expand on that with an example or upstream documentation? From my reading it means that if you are in find_package(PkgA) that used PkgA_ROOT, and you called a find_dependency(PkgB) it will search in PkgA_ROOT as well?

https://cmake.org/cmake/help/latest/command/find_package.html

The package root variables are maintained as a stack so if called from within a find module, 
root paths from the parent's find module will also be searched after paths for the current package. 
This can be skipped if NO_PACKAGE_ROOT_PATH is passed or by setting the CMAKE_FIND_USE_PACKAGE_ROOT_PATH to FALSE.
LecrisUT commented 3 months ago

Definitely should use either _ROOT or CMAKE_PREFIX_PATH given this info

henryiii commented 3 months ago

If you wanted to point at .../lib/... vs. .../lib64/..., it would quite hard with the current "simple" entry points; you'd actually need to modify the entry point based on the build, which probably can be done via dynamic-metadata, but is non-trivial. So our new cmake.root entry-point could either be a pointer to a object that produces the entry point, or we could extend all three to differentiate between a package only (classic method) or a package + object, which would call the function.

Not sure if we should also specify some way to scope the access, since a package is not guaranteed to be on disk, but the file we want needs to be accessible via a disk-based path for at least the duration of the CMake call. We could make the object a context manager, for example.

LecrisUT commented 3 months ago

I think we need to first start with how would the user use CMakePackageConfigHelpers macros. I can't envision right now if it would be an issue. What would the site_packages folder look like?