Open KyleFromNVIDIA opened 3 months ago
Great idea!
To flesh out this a bit more, how should the project define the CMakePackageConfigHelpers
calls to support standard and wheel install paths, e.g. INTALL_DESTINATION
. Probably it's more an issue of convention. There are <pkg>.libs
folder structure that are used to store bundled dependency libraries so probably something similar is in order, something to store stuff under site_packages
.
There is also auditwheel
that can potentially move (or rather copy) dependencies into <pkg>.libs
folder. Any idea on how to integrate with that, e.g. how to avoid doubling the libraries is another scikit-build-core
package also provides it?
On Fedora packaging side, we also have to figure out how to point the entry_point there to the system installed one.
On python side, it would have to look slightly different because afaik, the entry_point must point to a python loadable path, but I don't see a reason why that path can't be made to be a valid python path with scikit-build-core
generated metadata to point to store the actual paths.
After doing some research on entry points, I understand them a little better, and have a better idea of how we would go about this.
The entry point would look more like:
[cmake.package]
ExamplePackage = "example:cmake_package_dir"
And then in example/__init__.py
:
import importlib.resources as rs
def cmake_package_dir(_package_name): # _package_name is ignored here, but could theoretically be used
return rs.files(__package__) / "lib64/cmake/example"
On python side, it would have to look slightly different because afaik, the entry_point must point to a python loadable path, but I don't see a reason why that path can't be made to be a valid python path with
scikit-build-core
generated metadata to point to store the actual paths.
Yes, that's basically what I'm advocating here.
There is also
auditwheel
that can potentially move (or rather copy) dependencies into<pkg>.libs
folder. Any idea on how to integrate with that, e.g. how to avoid doubling the libraries is anotherscikit-build-core
package also provides it?
I believe auditwheel
only copies the runtime dependencies (*.so
files), not CMake package config files, so it shouldn't be an issue.
I believe we had this idea at the beginning, got stuck on if it should set _DIR or _ROOT, and then somehow didn't end up adding either...
IIRC _ROOT is the better one?
No. *_ROOT
specifies a prefix to search through various subdirectories (lib
, lib64
, lib/cmake
, etc.) while *_DIR
specifies exactly where to find the file. We know exactly where the file is at installation time, so there's no reason not to record it and remove all of the ambiguity.
We've been supporting directly specifying the directory and not requiring an entry point function. It just needs to be a valid Python module, which is pretty much just a directory.
Ahh, but a module path could resolve to several physical paths. But I think we could just then pick the one with the config file in it?
We've been supporting directly specifying the directory and not requiring an entry point function.
How? How does the consuming project find the directly-specified directory?
Ahh, but a module path could resolve to several physical paths. But I think we could just then pick the one with the config file in it?
Shouldn't be a problem. The idea is that scikit-build-core
generates a vanilla python module that stores the path to the Config.cmake
file. Similar to the editable install approach. Then scikit-build-core
loads the function/attribute to get the path that it should populatein the _ROOT
/_DIR
variable.
I.e. not using importlib.resources
past the entry_point, but standard pathlib.Path
relative to the generated file. One key point would be to have a dedicated/predictable path like <pkg>.cmake
at the site_packages top level so that relative navigation can be done reliably
We've been supporting directly specifying the directory and not requiring an entry point function.
How? How does the consuming project find the directly-specified directory?
I guess you can always define cache variables manually, but not sure if it refers to that. And also this seems to be the reverse, i.e. the installed package defines what and where Config.cmake
it provides, instead of the dependent having to define where to search. I don't believe there is a functionality for the former
I don't believe there is a functionality for the former
Right. I'm proposing adding functionality for this:
the installed package defines what and where
Config.cmake
it provides
so that consuming packages don't have to know things about the dependency's wheel layout. We were almost there with cmake.prefix
, but it wasn't able to handle non-standard installation layouts and layouts that use lib64
when the host system is Debian-based.
Hmm, also occurs to me, how would one disable the preference of the entry_points? For CMAKE_PREFIX_PATH
it is somewhat doable if we make sure to append to the existing value, but _ROOT
and _DIR
have top-most priority so probably those should not be used.
The longterm proper solution would probably be to write a DependencyProvider interface like vcpkg
apparently already does? But then how would we enable multiple DependencyProvider or find them?
Kyle and I were discussing this earlier and I am supportive of the idea.
We've been supporting directly specifying the directory and not requiring an entry point function.
How? How does the consuming project find the directly-specified directory?
I think Henry was referring to the fact that with the cmake.prefix
entrypoint projects just specify a module and SKBC loads that module, rather than specifying an object within the module to call.
The idea is that scikit-build-core generates a vanilla python module that stores the path to the Config.cmake file.
Could you actually stick to the above and simply specify the relative path to the directory? Directories are (namespace) packages, therefore importable modules, so I think you could just take the resulting module's __file__
(ep.load().__file__
) as the <package>_DIR
I believe.
Hmm, also occurs to me, how would one disable the preference of the entry_points? For CMAKE_PREFIX_PATH it is somewhat doable if we make sure to append to the existing value, but _ROOT and _DIR have top-most priority so probably those should not be used.
Are you imagining a situation where there are multiple providers of a config file with the same name? I don't know if there is a good way to avoid them clobbering each other in this case. _ROOT
would sort of work for this because CMake allows it to be a list of directories instead of a single one, but we don't want to use that for the reasons Kyle enumerated above, and also that would introduce an arbitrarily ordering. Conversely _DIR
is a single location, so there is no way to preserve multiple. I'm not sure that this is a situation that we should worry about, though. If you were to install these packages into your sysroot they would clobber each other anyway, and similarly if you were to add multiple conflicting configs to the CMAKE_PREFIX_PATH
.
I think Henry was referring to the fact that with the
cmake.prefix
entrypoint projects just specify a module
Yes, also cmake.module
. These work by simply providing the module that contains the CMake files, and the tool using them (scikit-build-core) can get the directory(s) from that. As long as it's a file-based loader. Other loaders could be handled eventually, by realizing the directory, but it's probably not needed realistically and could be handled eventually (at least for newer Pythons).
Anyway, we should not be discussing changing the current system in this issue - the new entry point should be just like the existing pair (minus the extra detail that the name matters).[^1]
[^1]: By the way, entry points only work if the package is installed into site-packages; we don't support packages installing into data/headers, etc. If we wanted to try to support this, we could do it this time. But that means users would need custom functions that located the cmake module, etc.
Are you imagining a situation
I think this was about how to disable an installed Python module forcing its CMake files to be found. If you had "thing" installed with thing = "thing.resources.cmake"
, for example, how would you instead get it to select a non-Python installed "thing"? We could potentially not set thing_DIR
if thing_DIR
was defined, but that would force someone to define it.
When we were first discussing this (can't find records, probably was around community meeting 5 or so), I think we had the idea that users could supply ignore patterns on the entry point names.
By the way, going back to the beginning:
But it gets even better! With the CMake File API, we can even automatically figure out
We can't add an entry point automatically unless a user sets this in their dynamic
array, that's a PEP 621 rule. Which means they know about it and probably could just actually set the entry point; explicit is better than implicit. And if we needed a custom function, this is out, we can't automatically add Python functions to a package!
Also:
when prefixes are really meant to have
I don't think that's required, though. Setting your CMAKE_PREFIX_PATH
directly to the location you want to search is supported. AFAICT this is just trying to make something that already works a bit "nicer" conceptually?
Could you actually stick to the above and simply specify the relative path to the directory? Directories are (namespace) packages, therefore importable modules, so I think you could just take the resulting module's
__file__
(ep.load().__file__
) as the<package>_DIR
I believe.
The issue I see is that it wouldn't be able to support pointing to system packages like /usr/lib64/cmake
because all of the path would have to be converted to python package, which obviously cannot be the case for /usr/...
. Distro packaging would be affected by this, but also spack, conda, etc. In principal we could patch out that entry point, but it would be preferred to keep things consistent, who knows how other tools will start assuming these.
Are you imagining a situation
I think this was about how to disable an installed Python module forcing its CMake files to be found.
Indeed, basically going by the general CMake guidelines to not define (top-level) user configurations inside a CMakeLists.txt. In a similar manner we should keep those empty such that the end-user can define them inside tools.scikit-build.cmake.define
or as --config-settings
. There could be checks if those variables are defined, but I think it would make it too complicated for the user to follow. CMAKE_PREFIX_PATH
on the other hand is standardized and used everywhere. (Btw, how would the cache variable interact with the env-variable?)
I believe we had this idea at the beginning, got stuck on if it should set _DIR or _ROOT, and then somehow didn't end up adding either...
IIRC _ROOT is the better one?
_ROOT
has extended behavior where it is propagated to any nested find_dependency
or find_package
call from the parent. That allows for a project that ship third party dependencies to find the packaged version.
_ROOT
has extended behavior where it is propagated to any nestedfind_dependency
orfind_package
call from the parent. That allows for a project that ship third party dependencies to find the packaged version.
Interesting, can you expand on that with an example or upstream documentation? From my reading it means that if you are in find_package(PkgA)
that used PkgA_ROOT
, and you called a find_dependency(PkgB)
it will search in PkgA_ROOT
as well?
_ROOT
has extended behavior where it is propagated to any nestedfind_dependency
orfind_package
call from the parent. That allows for a project that ship third party dependencies to find the packaged version.Interesting, can you expand on that with an example or upstream documentation? From my reading it means that if you are in
find_package(PkgA)
that usedPkgA_ROOT
, and you called afind_dependency(PkgB)
it will search inPkgA_ROOT
as well?
https://cmake.org/cmake/help/latest/command/find_package.html
The package root variables are maintained as a stack so if called from within a find module,
root paths from the parent's find module will also be searched after paths for the current package.
This can be skipped if NO_PACKAGE_ROOT_PATH is passed or by setting the CMAKE_FIND_USE_PACKAGE_ROOT_PATH to FALSE.
Definitely should use either _ROOT
or CMAKE_PREFIX_PATH
given this info
If you wanted to point at .../lib/...
vs. .../lib64/...
, it would quite hard with the current "simple" entry points; you'd actually need to modify the entry point based on the build, which probably can be done via dynamic-metadata, but is non-trivial. So our new cmake.root
entry-point could either be a pointer to a object that produces the entry point, or we could extend all three to differentiate between a package only (classic method) or a package + object, which would call the function.
Not sure if we should also specify some way to scope the access, since a package is not guaranteed to be on disk, but the file we want needs to be accessible via a disk-based path for at least the duration of the CMake call. We could make the object a context manager, for example.
I think we need to first start with how would the user use CMakePackageConfigHelpers
macros. I can't envision right now if it would be an issue. What would the site_packages
folder look like?
Scikit-build-core already supports the
cmake.prefix
entry point, which appends the value toCMAKE_PREFIX_PATH
. In theory, this will allow projects that installlib/cmake/<package>/<package>-config.cmake
to automatically be found by consuming projects. However, this does not work in cases where:lib64/cmake/<package>/<package>-config.cmake
, and the consuming package is built on a Debian-based system, because in this case,FIND_LIBRARY_USE_LIB64_PATHS
is set toFALSE
In theory, the exporting project could set
cmake.prefix
to<wheel-name>/lib64/cmake/<package>
, because thefind_package()
search procedure does look directly in the values ofCMAKE_PREFIX_PATH
in addition to the various subdirectories listed. However, that is only conventional on Windows, and I don't consider it good practice to setCMAKE_PREFIX_PATH
to<wheel-name>/lib64/cmake/<package>
, when prefixes are really meant to havelib64/cmake/<package>
within them.I propose an alternative: a new entry point called
cmake.package
, that is used like so inentry_points.txt
:In this case, scikit-build-core would pass
-DExamplePackage_DIR=/home/user/.local/lib/python3.12/site-packages/example/lib64/cmake/example
to CMake. The*_DIR
variable tells CMake exactly where to find a package file, rather than giving it a list of prefixes to search through their subdirectories. There's no ambiguity, and it works with any package layout, as long as the config file is named<Package>Config.cmake
or<package>-config.cmake
.But it gets even better! With the CMake File API, we can even automatically figure out if the project is exporting any package config files. By searching through the
installer
keys, it can search forfile
installers that install<package>-config.cmake
or<Package>Config.cmake
, andexport
installers that install<package>-targets.cmake
or<Package>Targets.cmake
. If it finds the two together, it's a pretty safe bet that that's a CMake package config file, and scikit-build-core can automatically create an entry point for it. (Note that this would be more ambiguous if the<package>-config.cmake
convention is used, since that loses casing information. In that case, we could defer to the export name, or just not support the<package>-config.cmake
convention.)Note that if the heuristics used by the automatic entry point creation are incorrect for any reason, the package creator can still manually specify a
cmake.package
entry point to override the automatic creation procedure.