Packaging Systems and Build Systems

isocpp / pkg-fmt

Metadata to support C++ packaging

Other

34 stars 1 forks source link

Packaging Systems and Build Systems #5

Open bretbrownjr opened 10 months ago

bretbrownjr commented 10 months ago

Problem

During recent discussion with @mwoehlke and @billhoffman about library packaging metadata, it became apparent that there isn't a clear vision about the intersections and disjunctions in the responsibilities of build systems compared to dependency management systems (including packaging systems).

To clarify, I'm including CMake as a build system for the purposes of this issue. And I'm including anything that shells out to build systems like CMake as a dependency management system, including the many flavors of "monorepos" that use source control mechanisms to control versioning, so long as they might execute autotools, make, ninja, cmake, etc., as subprocesses.

Feedback Requested

Quick responses via emojis on this issue are well received if that's all you can manage right now.

More thoughtful responses are also appreciated. Especially helpful are real-world examples of libraries or dependency arrangements that can clarify requirements.

Also helpful is feedback about what sorts of convergence are possible and not possible. Similarly, which kinds of convergence are worth the effort versus not.

Proposal

I believe it makes sense for the ISO C++ Tooling Study Group and interested community members that can contribute other ways (i.e., here) to form some consensus about best practices (if not standards) around:

What build systems should uniquely be responsible for
- For instance, incremental builds of intermediate artifacts
What dependency management systems should be uniquely responsible for
- For instance, providing required dependencies
What responsibilities are shared between build systems and packaging systems
- For instance, there are strong implicit requirements dependency management systems document and/or provide metadata about how dependencies are coherently consumed. Build systems are required to cooperate with those assumptions and/or rules.
What responsibilities are out of scope for both build systems and packaging systems
- For instance, problems with no satisfactory solution so far.

Another way to think about this problem is that we should develop clear consensus around when there is a bug in the dependency management or the build configuration given various kinds of problems.

Background

To briefly illustrate a point of discussion, @mwoehlke was describing requirements in which packaging metadata would be able to describe various "flavors" of libraries that are installed on disk for various systems, especially build systems, to take advantage of when resolving which flavors are suitable for the current build workflow. That gave me some pause.

First, there are circumstances, such as ABI-compatible debug builds of dependencies, where it's normal and harmless to have multiple "flavors" of the same library installed. In those cases, the headers involved are typically compatible if they are not literally the same headers to start with.

But there are other circumstances where this is not the case. For instance, it's relatively popular for publicly available projects to provide header-only and compiled flavors. And in those cases, it's relatively popular to provide preprocessor interfaces to parse relevant headers in "header-only" mode or not, for instance a MYLIB_HEADER_ONLY variable with non-zero values indicating the project is built in "header-only" mode.

With that in mind, it seems likely that we'll want to support all of the above. But I think it would be best for end users if the dependency management system would, with user or other maintainer input, decide whether a given dependency is header-only and then communicate to downstream build systems in a coherent way that a selection is made and what flags should be set. I plan on circling back to further illustrate difficulties in being incoherent about these selections, but it's probably simple enough to state that it's not reasonable for all transitive consumers of a library to compatibly maintain hardcoded decisions and/or compatible algorithms to select how to coherently parse relevant headers to avoid One Definition Rule issues.

I'll note that I'm willing to concede to reality to maximize the utility of build systems and dependency management systems to end users. For instance, it might be fine or even required for dependency management systems to provide multiple conflicting versions of libraries for build systems to later make sense of to some degree, especially for transitional and other edge cases.

steve-downey commented 10 months ago

Maven, which I think is still the default in the Java space, was an early build and dependency management system that would go out to the internet and download current dependencies. This was a nightmare in controlled environments. My build environment at the time had no direct access at all to the internet, and getting packages from elsewhere was strictly off the table.

alexreinking commented 10 months ago

I see the relationship between the build system and dependency management system as one of request and provision.

The build system requests, via a standard mechanism, a dependency along with some constraints (such as a version number).

The dependency manager responds by doing whatever it needs to do to provide a dependency that satisfies the constraints, or reports a failure.

I'm leaving open the set of requirements that can be placed upon a request. It might include such things as version ranges, linking types (i.e. static or shared), package-specific features (e.g. the library has JPEG support compiled in), debug/release modes, and more.

For a given dependency, there is a multiplicity issue. Can you request multiple, conflicting, versions of a dependency?

If not, and if the build system is imperative, there is a potential ordering issue: an earlier request might conflict with a later one, but not the other way around.

CMake is imperative and has issues with conflicting dependency requests within the same directory. However, imported targets are directory scoped, so one build can use multiple versions of a library, so long as they are appropriately partitioned.

mwoehlke commented 10 months ago

Some thoughts:

Linux distros are an entrenched reality. Trying to get them to change, in terms either function or "turf", is going to be a significant uphill battle. Similarly, CPS was never intended to redefine "building", "packaging" or the like (i.e. to redraw turf boundaries), but to take something which exists but is tool-specific and open it up while hammering out some of the blemishes in the process.

To that end, I don't consider it the package manager's problem to prevent incompatible dependencies. Right now, package managers (by which I typically mean "tools such as apt/dnf") are happy to allow co-installation of packages for different platforms (whether that means different ISAs such as ARM vs. x86_64, or different runtimes such as Linux vs. Windows/mingw) and for different uses (e.g. static, shared, header-only; threaded vs. non-threaded; etc.). There are users that need this functionality.

IMHO the build tool should be responsible for sorting this out, rejecting packages that aren't appropriate to the build environment, and (ideally) reconciling or (at minimum) flagging conflicts when they occur. To at least some extent, I think this is manageable via consumers properly communicating what configurations they require or whether they care.

To be clear, this does mean that the build tool is responsible for first building a complete but abstract dependency graph and then attempting to resolve possibly-conflicting requirements in that graph into definite components.

Note that this doesn't apply to just multiple configurations. Here's a similar example:

The project being built, Alpha, depends on Bravo and Charlie.
Bravo depends on Delta 1.2.
Charlie depends on Delta 1.4.
Alpha asks for Bravo. Bravo asks for Delta. Initially, Delta 1.3 is discovered.

If dependency resolution happens in strict, depth-first order, as would be the case in a typical, existing build system, the build tool would fail to resolve dependencies for Alpha, because it would have located and "locked in" the Delta 1.3 that was found first and deemed to be acceptable for Bravo. The correct course of action here is clearly to be able to "back out" that selection in light of Charlie's requirement for Delta 1.4.

The same principle applies to configurations. Bravo requires Delta, but doesn't care what flavor. The first flavor of Delta we find is "static". Charlie requires the "header-only" flavor of Delta. As above, the ideal solution is for the dependency resolver (which is, note, the build tool) to be able to "revisit" Bravo's dependency and either identify that "header-only" Delta can be used to satisfy it, or (e.g. if Bravo specifically required the "static" configuration, or in the previous example, if Delta 1.4 does not satisfy the request for Delta 1.2) to recognize that conflicting requirements are present.

Stepping back, however, I honestly don't see a problem here. I believe CPS gives us¹ the tools to at least express compatibility. Worst case, the build tool isn't smart enough to "backtrack and retry", and it's up to the user to resolve conflicts when they occur by selecting a package that is suitable to all consumers. Also, I fully expect there to be instances where e.g. Bravo and Charlie were built against conflicting versions of Delta and simply can't be consumed together without rebuilding at least one of them. From the standpoint of a package description, all we can realistically hope to do in such case is clearly identify the issue and refuse to proceed, and I think we have the tools already to do that.

(¹ We almost surely need to add some sort of Conflicts specification, but that's definitely on the roadmap already.)

While the notion of "conflicting packages" exists at the package management level, I believe that to serve two purposes:

Identify packages whose files can't be co-installed, e.g. because two packages would install different files under the same full path. (Arguably this is a packaging bug.)
Identify packages which would result in an unusable system if they were both installed at the same time. This occurs because the concepts of "installed" and "in use" are sometimes conflated at the distro package manager level. For example, a (meta-)package which establishes init as the system root process obviously conflicts with one that provides systemd as the root process. (Arguably, this should only apply to meta-packages, and the actual underlying software should always be co-installable.)

Thus, I don't believe conflict resolution is the job of the package manager; I believe it should and can be handled by the build tool.

mwoehlke commented 10 months ago

It seems to me that the real conversation is "what is a dependency management tool, and how does it relate to existing build and packaging tools?".

I do agree that C++ needs such a tool, but that tool is not CMake or rpm/dpkg. It should, however, interoperate with CMake. Perhaps apt/dnf/etc. can serve as that tool; if not, that tool most certainly needs to interoperate with apt, dnf, or whatever is used to manage distro packages on a Linux/BSD/etc. system. Perhaps that tool overlaps with the build system in needing to resolve conflicts, though I would envision the build tool would build the most complete dependency graph possible while identifying dependencies that could not be located, and would pass that information to the dependency manager.

I think what I'd ideally like to see is a specification for a CLI interface that a build system can feed a CPS Requires which would attempt to satisfy those requirements and/or tell the user how to satisfy them. This probably means package managers ought to be able to parse CPS and may need to do some conflict resolution of their own, though I could live with not being able to identify all conflicts until the packages are present on-system.

That said, I think expanding CPS to be a sort of "lingua franca" to the package manager level would be pretty neat. Teaching it to pip, for instance, would be tremendous if it allowed pip install in a venv to prompt the user to install dependencies via apt/dnf/etc. instead. It would be awesome if command-not-found knew how to search multiple possible providers, but even being able to query a standardized interface would be an improvement!

I'm willing to concede that the dependency manager should be capable of resolving, or at least identifying, conflicts. That is, if it's asked to satisfy some request, it's okay for it to be clever enough to identify that the request is inherently contradictory and can't be fulfilled. I don't believe it should enforce that users don't end up with installed packages that have the potential to conflict. (But I'm okay if part of its suggestion for how to satisfy a request involves recommending that some package be removed.)

DarkArc commented 10 months ago

Maven, which I think is still the default in the Java space, was an early build and dependency management system that would go out to the internet and download current dependencies. This was a nightmare in controlled environments. My build environment at the time had no direct access at all to the internet, and getting packages from elsewhere was strictly off the table.

I have issues with this comment. The first is that maven was hugely successful, and is hands down one of the most successful build tools of all time, (along with npm, pip, gem, and cargo which all go out to the ~~internet~~ network) I think that needs acknowledgement.

Second, Maven is arguably no longer the "default" in the Java space. That crown is now well contested by gradle due to improvements made to build performance and the ability to use an imperative programming language (Groovy or Kotlin) rather than XML.

Third, I think more context is warranted about the controlled environment you were working under. Maven does not require internet access, it requires access to the network (which can be an internal network) to reach the relevant maven repositories and fetch artifacts. This is a requirement that Gradle mirrors as it makes use of Maven's repository format.

If your environment had issues with network based approaches I think those details would be interesting as all of these package management systems that utilize the network are (to my knowledge) designed with the requirement that some deployments will be used in constrained environments (i.e., serving only internal artifacts) in mind (to the extent that both Maven and Gradle are capable of fully offline builds and manual installation of dependencies).

This combined context does bring up a potentially relevant point that the Java ecosystem has managed to in effect create a tool-independent de facto standard that's lasted over the decade and seen widespread deployment in the Maven repository format.

it became apparent that there isn't a clear vision about the intersections and disjunctions in the responsibilities of build systems compared to dependency management systems (including packaging systems).

I would, simply put, define a pure build system as a tool that works with one or more files to produce build artifacts. Likewise, I'd define a pure dependency management system as a tool that when given a list of required build prerequisites provides them (it can do that however it wants, asking build systems for those artifacts, asking a remote server, searching the file system, via the system package manager, a combination of approaches, etc).

I think a lot of existing approaches are both build systems and dependency management systems (as these concerns are often so closely related).

To put it another way, a build system is a chef, the chef knows where items are in their kitchen and how to use those items to produce food. However, the chef may not (and likely doesn't) know the full scope of how those items came to be in their kitchen.

With that in mind, it seems likely that we'll want to support all of the above. But I think it would be best for end users if the dependency management system would, with user or other maintainer input, decide whether a given dependency is header-only and then communicate to downstream build systems in a coherent way that a selection is made and what flags should be set.

I'm not sure I'm comfortable with the dependency management system "deciding" things for dependencies or for the tool requesting dependencies. The dependency system should be able to "reject" things though (perhaps it's just your choice of words and this is what you mean? :) ). If I tell the dependency management system (as a build system) "I need X, Y, Z and I'm going to set Y's build macro FOO to 1" the dependency management system shouldn't tell me what flags to use. It might be helpful, however if it told me that its copy of Y requires FOO set to 0, and that I either need to change my FOO macro, ignore its warning, or provide a different copy of Y.

i.e., I can't speak for the build system folks, but I imagine they want to say little more than "I need these libraries [with some qualifying statements about the platform and architecture], can you give it to me, yes or no" as an "MVP". I see potential value in adding "I plan to use these configuration macros, does that change your answer", but I don't see that as the "MVP".

mwoehlke commented 10 months ago

(Inspired by Wyatt's comment(s)...)

Maven, which I think is still the default in the Java space, was an early build and dependency management system that would go out to the internet and download current dependencies. This was a nightmare in controlled environments. My build environment at the time had no direct access at all to the internet, and getting packages from elsewhere was strictly off the table.

I think this could be an argument for not forgetting the UNIX tool philosophy; do one thing and do it well. In other words, building a monolithic tool is perhaps not the answer. (In fact, I would argue there are lots of reasons why it isn't the answer.) As much as various folks have tried to promote the One True Build Tool, I, personally, view that as a crusade that's doomed to failure. Similarly with package managers, trying to e.g. get the whole world to use apt is not a cause that inspires optimism. My approach is to instead build on what UNIX did well, which is to build a series of tools that interoperate well. That's the goal of CPS, and I think it's one that is attainable if we can get enough people on board with that approach, as opposed to being wedded to their personal monoliths.

The build tool should be in charge of expressing what packages it needs in a common language with enough detail that a hypothetical dependency manager can act to satisfy those dependencies. In an environment that lacks the network resources to proceed further, that's fine. The build tool did it's job; it's up to the user to resolve the issues before proceeding. In an Internet-connected environment, the build tool can ask the dependency manager to go fetch what it needs.

Another point: the user should be able to control what acts as the dependency manager. One thing I strongly dislike is packages that go download their dependencies no matter what when I would rather build and manage them myself, or use my distro package manager to provide them. That's something IMNSHO we should be encouraging people to stop doing. I think better ability to communicate dependencies can help, especially if we do also work on (separate!) dependency management tools. I'm happy to have a world in which I can check out Foo and Just Build It™, with all of its dependencies automagically downloaded and built, as long as I can opt out of that behavior. (Ideally, building dependencies would be a last resort, with distro-provided packaged preferred, followed by pre-built packages from "first party" and/or third party providers.)

I think Steve's objection and my own can be expressed "don't force me to use a particular dependency manager as the only way to build your software". I think Wyatt's definitions of the separation between those is on point.

I'm not sure I'm comfortable with the dependency management system "deciding" things for dependencies or for the tool requesting dependencies.

I consider not doing so a non-starter. For one, you have the issue Bret brought up when starting this conversation, or e.g. as in my examples; while trying to build Alpha, Bravo and Charlie both need Delta, but express different needs with respect to Delta. In the case that some flavor of Delta exists that can satisfy both Bravo and Charlie, the dependency manager absolutely should determine that and provide the appropriate flavor. You also have the problem of preferred providers; for example, X requires Y 1.2, but your distro has Y 1.3. There, I expect the dependency manager to 'recommend' the distro's Y package, rather than going to some less-preferred source to obtain a more exact match.

If I tell the dependency management system (as a build system) "I need X, Y, Z and I'm going to set Y's build macro FOO to 1" the [dependency manager]¹ shouldn't tell me what flags to use. It might be helpful, however if it told me that its copy of Y requires FOO set to 0, and that I either need to change my FOO macro, ignore its warning, or provide a different copy of Y.

(¹ @DarkArc, you wrote "build system", but I think you meant "dependency manager" here?)

Okay, we might be on the same page. I would expect the dependency manager to do one of three things in your example:

Say, "I'm sorry, Dave, I can't do that", because it doesn't know how to satisfy Y(FOO=1). It may or may not be clever enough to tell you that other flavors of Y are available.
Inform you that your dependency request included the contradiction Y(FOO=1) and Y(FOO=0). Note that this doesn't appear to be the case in your example, but it can, in theory, happen.
Provide Y(FOO=1), possibly from a lower-priority provider, even though Y(FOO=0) was already installed and possibly from a higher-priority provider

I agree with you that the dependency manager would not turn around and tell the build tool "don't do that". What it might do is give you Y(FOO=1) if you just asked for Y.

bretbrownjr commented 10 months ago

I'm going to hopefully pare down a lot of complexity to some basic requirements and other facts. Hopefully everyone will find this helpful to frame why certain things need to be out of scope of build configurations and in scope for a broader dependency management context.

Quick Definitions

project - a unit of release of source code; for instance a package, a monorepo subdirectory, or a git submodule
build configuration - a build system as implemented for a specific project
build system - a tool that facilitates build configuration, especially to avoid implementing unique build scripts for every project
link context - the context used to generate a deliverable and linked artifact such as an executable or shared library

Link Contexts Require Coherency

Within a given link context, an executable or linked library, incoherent parsing leads to One Definition Rule violations for C++. It's generally understood that compilers and linkers do not have enough context to reliably detect and identify these mistakes, so resulting programs are considered "ill formed" and at least toolchains are not required to diagnose these issues.

Similarly issues often arise when various intermediate objects and libraries are built against inconsistent versions of common dependencies. Again, these mistakes result in ill-formed programs with no diagnostics required on the part of toolchains.

Individual Projects Cannot Reliably Provide Coherency

For instance assume libnext is a popular but ABI-tricky library. It requires the C++ standard version is reliably provided because it sometimes uses standard types but sometimes provides its own types with the same spelling, which is a common pattern in libraries with forwards-compatible standard library features. This means every project built against libnext has to consistently set the C++ standard version when parsing libnext. Further, assuming they use libnext header files (a safe assumption in 2023) , they themselves need to be built with that C++ standard version, which means downstream users likewise have to use the same C++ standard version. And so, effectively, the entire link context needs to set that value consistently.

The problem is that for the build configuration of any project in the dependency graph, no single C++ standard version can be set correctly. The correct setting is a property shared across the entire link context. Either that setting can be consistently hardcoded in every project, consistently propagated to each project, or some combination of the two.

Therefore, Coherency Requires Dependency Management

I agree with comments in this issue that have pointed out that there are a spectrum of ways to assist end users to improve the state of C++ tooling. Bare minimum, we should make it easier to declare what assumptions each project has so that something with more available context can detect errors in clearer ways instead of leaving end users to figure out the problem from core dumps or corrupted data. Though with momentum and consensus, it seems plausible that we can develop more robust standards for interop between dependency resolution tools that drive correct builds instead of merely detecting incorrect ones.

Requirements

So I'm thinking the following holds given all that:

Something should provide ways for linkage contexts to include information about how relevant choices were made while building individual projects. This is where standards to facilitate interop come into play. Details such as which C++ standards versions are compatible (possibly only one!) could be described clearly.
Something should detect and/or prevent incoherency. Since individual build contexts cannot perform that duty, something with awareness of the link context should instead.

In my mind the first bullet point describes requirements for projects and their build systems (which have needed visibility about specific build configuration choices). And in my mind the second bullet point describes requirements for a higher level "dependency management" context... which could be a higher-order build system I suppose, but often the role is performed in packaging systems by human beings. Package managers that maintain package distributions in particular will guess, check, inspect, and troubleshoot their way to piles of patches to somehow squeeze these bugs out of their ecosystems.

jbadwaik commented 9 months ago

In matter of build configurations, I would like to put forward the idea of supporting features and dependencies like Cargo does. Features and the corresponding dependencies provide a reasonable mechanism for conditional compilation and optional dependencies. Furthermore, the mechanism allows one to demand that the dependencies be combined with certain features as well. This in my opinion would significantly simplify the correct resolution of dependencies in a dependency tree.

ruoso commented 3 months ago

Documenting some discussions we were having in the Tokyo WG21 meeting, trying to figure out how to think about the boundary layers between package managers and build systems.

package-management-build-system-small

bretbrownjr commented 3 months ago

Just to reiterate on the motivation here, we have a massive interop problem with these tools. The current state of the art is that the interop between different combinations of tools are each implemented as a bespoke development activity. For instance, if a new build system is introduced, manual engineering effort needs to be spent to get that build system to work with each interesting package manager.

Instead, we need to define a clean "seam" between different tools so that build systems and package managers can clearly define what they should not be responsible for. Instead, they should be able to target new interoperation standards like CPS.

Given the above chart, we would expect a "pure" package manager to feel comfortable not supporting construction of correct CXXFLAGS -- lists of flags to copy/paste into compile commands to build against dependencies correctly.

On the other end of the spectrum, we would expect "pure" build system to feel comfortable not supporting retrieval and discovery of compatible dependencies, such as downloading archives from the internet.

@ruoso and I are developing the idea of a middle role that tools serve. It's relatively a more narrow task, but it is required. We want to define it so that we can reasonably describe the overlap between the responsibilities of different tools. The diagram @ruoso just posted is an illustration of that overlap.