mpickering commented 1 year ago

This ticket exists to track the design of the a new 'Hooks' build-type. The 'Hooks' build type is the successor to the 'Custom' build type.

The Custom build type allows the build process to be completely modified, for example, a user can provide whatever buildHook they want, and so higher-level tools such as cabal-install and stack have to treat packages with Custom build type as black boxes.

In practice, no one uses this full power of Custom to completely change a phase, custom Setup.hs scripts augment rather than replace the build phases. The hooks interface we are designing will contain pre/post hooks for specific phases which support this augmentation in a specific way.

The Hooks build type will be designed to subsume all known uses of Custom Setup.hs script.

Subsequently we hope to also fix a number of open tickets relating to the UserHooks interface by considering them in our design of the new SetupHooks interface.

7350
7877
6112
6232
5933
3600
2910

andreabedini commented 1 year ago

Is this a proposal or there is some work in progress?

I wonder if this is worth doing at all. From my POV custom setups should just disappear and be forgotten. If there are use-cases of custom setups that Cabal cannot yet express, we should rather work in the direction of supporting those use-cases.

Note that adding a new build-type does not simplify building any package that currently use a custom setup; and packages using a custom setup should be better moving to a simple setup (or any other declarative setup, if there were any other).

My 2c.

angerman commented 1 year ago

How will build-type: Hook work in the cross compilation setting?

mpickering commented 1 year ago

Is this a proposal or there is some work in progress?

I wonder if this is worth doing at all. From my POV custom setups should just disappear and be forgotten. If there are use-cases of custom setups that Cabal cannot yet express, we should rather work in the direction of supporting those use-cases.

Note that adding a new build-type does not simplify building any package that currently use a custom setup; and packages using a custom setup should be better moving to a simple setup (or any other declarative setup, if there were any other).

My 2c.

There is work in progress, we are writing a design document which can be shared for review once it is finished.

Part of the work is to identify if there are some common features of current Setup.hs which can or should be made into declarative features.

As long as the declarative feature involves executing some arbitrary Haskell executable, I don't see that has a significant benefit over a Setup.hs script.

mpickering commented 1 year ago

How will build-type: Hook work in the cross compilation setting?

In the same way as build-type: Custom is broken with cross-compilers, so is build-type: Hooks. This is orthogonal to this work as cabal-install does not understand about cross-compilation at all.

angerman commented 1 year ago

Build-type custom is broken in cabal-install. Not strictly with Setup.hs. So we can assume that build-type: Hook will work with setup.hs as well?

hasufell commented 1 year ago

This seems large and significant enough to create a HF tech proposal to get input from a wider range of experts?

I know it's hard to source community opinions and I'd rather not ask on discourse. Hence maybe involving HF is worthwhile?

mpickering commented 1 year ago

Build-type custom is broken in cabal-install. Not strictly with Setup.hs. So we can assume that build-type: Hook will work with setup.hs as well?

How is Custom broken in cabal-install? Ticket?

angerman commented 1 year ago

Custom is broken in cabal-install for cross compilation because cabal-install doesn't know about multiple compiler. For non cabal-install using cross compilation pipelines Custom does work (because it's effectively just Setup.hs.

mpickering commented 1 year ago

For non cabal-install using cross compilation pipelines Custom does work (because it's effectively just Setup.hs.

What does that mean? If you are not using cabal-install can't you compile the ./Setup.hs with the compiler which targets the host and then execute ./Setup configure, ./Setup build?

angerman commented 1 year ago

You can use the bootstrap compiler for your cross compiler. And most of the time you want to build your cross compiler from the same source as your bootstrap compiler. So both are the same version. This often gives a good enough approximation.

Some others use the cross compiler to build the Setup.hs and evaluate it in a cross context (qemu, wine, ...).

The second approach could potentially work for cabal-install, if we had setup-wrapper; depending on the target might not be the preferred approach though.

I am not saying custom setup.hs blackboxes are great. I'd much prefer we didn't have them. I am pointing out that we have practical approaches to deal with them in the cross compilation setting.

Hence my question if this new build-type will make the current situation better or worse.

mpickering commented 1 year ago

Are you saying that there are two options:

Build the Setup.hs with a host compiler, but that has to be the same version as the cross compiler (why the same version?)
Build the Setup.hs with a cross-compiler and then run the executable in an emulator.

It seems that you are suggesting you have to run the Setup.hs script in the cross context, I don't understand why you have to do that?

As imagined, the new build-type won't make things better or worse, just the same as before (for cross-compilation).

angerman commented 1 year ago

Yes. Those are the two options that I've seen being used. Why the same version? Because you effectively want your cross compiler to be a stage3 compiler for sanity and behaviour reasons.

Option (a) is somewhat dishonest to the platform configure is run on (after all we don't really have this specified anywhere but the assumption occasionally is that the configure phase runs on the same host as the final build product). Option (b) is more honest to the platform, but brings with it the need for a full target toolchain, and tooling available to execute in the target context.

So I'll take that Hooks build-type will have the same drawbacks as the current Custom build type, and none that make cross compilation harder. That was the statement I was after.

andreabedini commented 1 year ago

As long as the declarative feature involves executing some arbitrary Haskell executable, I don't see that has a significant benefit over a Setup.hs script.

Maybe my brain needs more coffee but I do see a difference: a part from custom setups, we don't currently run arbitrary Haskell executables during the build process. Is that right?

What makes custom setups undesiderable is not (only) the security concern in running arbitrary code during build but the fact that they can change the build process as we go. They cannot change the plan but they can change how we call GHC and (if I understand correclty) this is what causes so much trouble to other tools like HLS (see the huge thread in #7489).

(Just following my thoughts now, no idea where I will end up :P)

In terms of use cases, we need to draw a line in the sand and decide what is and is not cabal responsibility. There is no way cabal itself can support all possible ways to build code. It is not tool as general as nix, bazel or even cmake or meson.

I might even say pkgconfig-depends was a strategical mistake, since it leads to DevEx issues we cannot solve inside cabal-install. One for all: pkg-config packages don't have global namespace and not even unique names; this means the name listed in pkgconfig-depends does not always help and fixing this is harder than specifying build flags and paths for a given package at projec or system level).

You could say that supporting setup hooks is the way to solve these kind of problem but I think I still disagree. cabal is not something users and developers can use to avoid having to configure their system toolchains, packages or settings.

I am -1 on extending the setup hooks mechanism.

andreabedini commented 1 year ago

Maybe my brain needs more coffee but I do see a difference: a part from custom setups, we don't currently run arbitrary Haskell executables during the build process. Is that right?

@angerman correctly points at TemplateHaskell but I think this does not undermine what I am trying to express here. TemplateHaskell does not affect the build process; at least not at cabal's level and excluding some perversions ~like runIO (callCommand "apt install libxyz")~ (correction: this would not affect the build since planning is already done, maybe doing some IO that influences subsequent custom setups? custom setups seem to be the root of all evil :P).

dcoutts commented 1 year ago

@angerman it'd be good to have input on which hooks ought to run where, i.e. host vs target and the information flow between them.

angerman commented 1 year ago

@dcoutts which hooks are we talking about? The existing ones?

It largely depends on what you are actually doing with them.

if you end up calling out to other processes, or read files, ... you most of the time want this to be on the build machine. (Git-rev, include file, ...)
if you want to inspect properties about the target (detect word size, build and run some code, ...)

We have no semantics around this yet. Having two distinctly named options for each hook and the relevant documentation might help to start having those semantics by encoding them in the respective hook names.

Even having them does not guarantee correct use, but it at least provides the foundation. Right now we have nothing.

andreabedini commented 1 year ago

@mpickering I allowed myself to edit the title and the description. I understand the design is still private and it has to discussed in the open before we can track its implementation.

dcoutts commented 1 year ago

@angerman not the existing ones. Similar idea but starting from a clean slate. The goal in a spec for this build type is to define the semantics of all the hooks, both what info is passed in and out of each hook, but also when (and where) build systems are expected to call the hooks. This spec would be the agreement between packages and build systems for this build type. And since we'll never get it all, or get it all right first time, we want a design that is easier to evolve than the existing UserHooks which are essentially frozen in time (from about 10 years ago) because any changes are backwards incompatible.

angerman commented 1 year ago

@dcoutts I'm sorry if that came across wrong. Without insight into the design, I could only use the existing hooks for illustrational purposes.

dcoutts commented 1 year ago

We'll post a draft design doc as soon as we can.

mpickering commented 1 year ago

I have posted the draft design doc, feedback is actively sought.

https://github.com/well-typed/hooks-build-type/blob/main/design.md

there is also the corresponding survey:

https://github.com/well-typed/hooks-build-type/blob/main/survey.md

michaelpj commented 1 year ago

Looks reasonable to me.

Be independent, in the sense that each can be invoked separately by calling an executable to run each hook.

How does this work for the hooks that return a value, like preConfPackageHook?

It's a bit odd that we split LocalBuildInfo into the modifiable and non-modifiable parts, but not so for Component, where we instead return the full object and have to dynamically check that it doesn't override things it shouldn't.

You mention dependency information for re-running build hooks. Presumably we also need the same for configure hooks? Since they can do IO, it's hard to know when we need to rerun them otherwise.

In addition, the Configure build-type can be implemented in terms of SetupHooks by running the script in the global configuration step, and then applying the result in the per-component configure hook.

How is the information passed between the phases here? By writing it to a file, I guess?

michaelpj commented 1 year ago

Volunteering a weird package: here's one of ours that runs Agda in Setup.hs to generate Haskell modules: https://github.com/input-output-hk/plutus/blob/master/plutus-metatheory

I don't remember the full details now (I think there are some comments), but I think it does some slightly fancy stuff to generate the modules inside dist-newstyle instead of in the source tree.

andreabedini commented 1 year ago

Thanks @mpickering, I'll dig into it on Monday (I guess it's ok if we quote the proposal in here?)

andreabedini commented 1 year ago

I finally had the time to carefully review the proposal and write down my comments.

I appreciate seeing all the work that when into this proposal but I am still opposing it (perhaps now better-informed).

The tl;dr: is that this proposal wants to add a new feature to address use-cases that either:

could be supported by existing features or tools (as claimed in the survey itself). E.g. doctesting, code-generation, build time information from the survey.
I see little value in supporting. E.g. ghc-paths, non-trivial auto-configurations from the survey.

The survey itself (and thank you for doing this!) indicates that the most common uses of custom setups are already within reach of the simple build type, or very close!

This feature will also start a migration phase which will end only when support for the custom build type is entirely removed.

I suggest a better path to move away from custom-setups would be fixing the gap with the identified use-cases.

Then perhaps we could just drop their support for custom builds from the codebase and add an implicit setup bound Cabal < 3.11 bound wherever we see a build-type: custom (To be clear, I am writing this last suggestion without having given it a careful consideration).

I have mostly skipped commenting on the implementation details and focused on the objections I have with respect to the use-cases, alternative solutions and cost/benefit ratio. Here are the inline comments.

Design

Hooks build type

In this document we propose a design of a new Hooks build type, which is intended to replace the Custom build type.

Motivation

This work is intended to resolve one major architectural design flaw in Cabal. Once resolved, this will establish the foundations for improvements in tooling based on Cabal, and make Cabal easier to maintain for the long term.

A fundamental assumption of the existing Cabal architecture is that each package supplies its own build system, with Cabal specifying the interface to that build system.

The "Cabal proposal" included many things:

a definition of what a package is and how to register packages with a compiler (note the concept of a cabal package ended up diverging from the concept of a ghc package, which AFAIK now we call "unit").
a specification for a build interface (e.g. a specification for runhaskell ./Setup.hs configure)
a specification for a "simple build infrastructure" that implements the above interface based on a declarative package description.
some draft design of other build types (like Distribution.Make)

So, yes, the Cabal proposal assumes that packages would bring their own build system (in the general case).

Modern projects consist of many packages. However an aggregation of per-package build systems makes it difficult or impossible to robustly implement cross-cutting features.

This is a great point. Packages in a project (at least local packages) should be using the same build system.

Thus the solution is to invert the design: instead of each package supplying its own build system, there should be a single build system that supports many packages.

I agree with this but there are some nuances. If Cabal is a implementation of the simple build type, what would be left of the build interface? (i.e. the specification for the ./Setup.hs interface). Is that to be binned?

We had a spec that says you can build a package with runhaskell ./Setup.hs build. Will Cabal (the Haskell project) be able to use such packages?

I understand this is a much wider discussion, outside the scope of the proposal, but this can become relevant in presence of multiple build-systems. What would the common interface be? Should there be any?

The overall goal is the deprecation and eventual removal of support for the custom build-type. [..] This migration is intended to take place over a long timescale to allow the gradual migration of packages which use the custom build type.

Here is my first real worry. The codebase is already full of unfinished features and migrations. Few examples:

The modular solver was introduced in 2011 (5e6bd3f7f71739a0cd31ff1b7a1e5f80d3bafe0d); in 2016 the top-down resolver was removed ... but the option to choose which solver to use was never removed.
Subcomponent targets were introduced in 2016 along with the new style project building (352f5795745a95d1d1dcaa941ee5507a7840401z6) and implemented in cabal-install. But as today they have not been implemented in Cabal. Leaving us with this:

-- Actually the reality is that no current version of Cabal's Setup.hs -- build command actually support building specific files or modules. setupHsSupportsSubComponentTargets = False

From: cabal-install/src/Distribution/Client/ProjectPlanning.hs

V2 commands where also introduced in 2016 but V1 commands still have to be maintained because not all commands were migrated and some use-cases had been neglected.
Public sublibraries were introduced in 2018, but the issue "Make the cabal-install solver multilibs-aware" is still open today in 2023 (#6039).
The cabal.project still uses a legacy parser.

Going back to the custom build type. Even with the advent of a new build type, Cabal will still need to support the custom build type for a long long time. Old packages and old versions will still exist.

If we can really deprecate the custom build type; then we should look for the simplest way to do it now, while considering the costs and benefits of every alternative. In the current state, I would classify any significant addition to the codebase as "high cost".

I argue that we can achieve the same goal by extending or fixing-up existing features in the simple build type. This aligns with the direction that the project seems to have taken for a while, which is increasing the expressive power the simple build-type. E.g. with the introduction of build-tool-depends and code-generators (mentioned later in this proposal).

Issues with UserHooks

Relevant tickets for these issues include:

Tickets labeled Hooks

Componentwise hooks

Hook redesign

LocalBuildInfo should change at build time when components are selected

This gives me the chance to claim that the issues with UserHooks are mostly irrelevant (beside, perhaps, the fact that they exist in the first place).

The label "Cabal: hooks" has 11 open issues (4 closed). This includes the tickets "Componentwise hooks" (#7350) and "LocalBuildInfo should change at build time when components are selected" (#2910) mentioned above.

For comparison, there are

15 open issues (12 closed) labelled either "cabal-install: v1-" or "re: v1-vs-v2".
133 open issues (333 closed) labelled "cabal-install: nix-local-build"
80 open issues (256 closed) labelled "cabal-install: solver"

SetupHooks.hs can also be compiled to a separate executable, which can be called to invoke each hook in a standalone fashion. This will allow finer-grained build plans, as is described at the end of this document in § Future work.

Given what I said above about unfinished work, seeing a "Future work" section triggers me a bit 😂.

Future work

In the future, it will be desirable for a tool like cabal-install to avoid using the ./Setup CLI interface at all.

I understand this is in the context of the proposed hooks build type? Because cabal-install already avoids calling ./Setup.hs for simple build types (when it is possible and safe to do so).

If I understand the design of the hook build type, in presence of build-type: Hooks cabal-install would be able to skip calling ./Setup.hs and use its linked-in implementation of the hook build type. I fail to see how this is different from what we do now with the simple build type.

There is no need to perform configuration because cabal-install has already decided about the global and local configuration.

It will allow finer grained build plans, because we don't have to rely on ./Setup.hs build in order to build a package. cabal-install could build each module one at a time.

It will allow cabal-install and other tools to use the Cabal library interface directly to manage the build process, which is richer and more flexible interface than what can be provided by a CLI.

What prevents tools from using the Cabal library interface now? I am confused by this whole section. We only need to compile and run ./Setup.hs for the custom build type. As far as I understand, the hooks build type will be identical to the simple build type from this point of view.

cabal-install will be able to use per-component builds for all packages, where currently it must fall back to per-package builds for packages using build-type: Custom. This will reduce the number of different code paths and simplify maintenance.

This leaves me even more confused. "For all packages" including the ones with "build-type: Custom"?

Perhaps this section only wants to highlight the benefits of a world without custom setups, rather than the benefits of the proposed build type.

Stackage Setup.hs survey

Doctests

Doctesting is a well-known black spot in the developer experience. There are 16 open issue (45 closed) mentioning "doctests" on cabal's repository.

Packages that rely on the cabal-doctests package for their doctests usually define a Custom setup script in terms of defaultMainWithDoctests. [..] aeson-diff-1.1.0.13, attoparsec-time-1.0.3, focuslist-0.1.1.0, haskell-gi-0.26.7, kleene-0.1, openapi3-3.2.3, password-3.0.2.1, password-instances-3.0.0.0, password-types-1.0.0.0, pcg-random-0.1.4.0, polysemy-1.9.1.0, polysemy-plugin-0.4.5.0, pretty-simple-4.1.2.0, servant-auth-docs-0.2.10.0, servant-openapi3-2.0.1.6, servant-swagger-1.1.11, swagger2-2.8.7, twitter-conduit-0.6.1, wai-logger-2.4.0, wreq-0.5.4.2, xml-conduit-1.9.1.3

Note: attoparsec-time-1.0.3 and kleene-0.1 have build-type: Simple.

The cabal-doctest package is deprecated and parts of the ecosystem (like lens) have migrated to cabal-docspec which works with simple build type.

Additionally, one of the use-cases for the new code-generators field (which you mention later) was supporting doctests, as shown in the demo put togheter by Gershom.

It is not my intention to claim that "the doctesting problem" is already solved, but I argue that doctesting has nothing to do with custom setups since it can already be done without. The packages listed above could migrated away from custom setups today.

The ideal design for doctests would be if cabal had specific support for doctests.

There you go. You already identify a better solution that would work with build-type simple!

The singletons-base package defines similar logic for its own testsuite. The testsuite takes the path to the compiler and the GHC options used to build the package before invoking GHC to compile modules in the testsuite.

It turns out singletons-base has already considered moving to using code-generators but Ryan ran into issues because of some confusion left behind from the introduction of per-component builds.

Fixing that issue would allow singletons-base to migrate away from build-type: Custom today (optimistically speaking).

haskell-gi packages

Another family of module generation tasks is to read a specification file and generate modules based on that. The haskell-gi package does this for a number of GUI libraries. [...] This is simple to support because the generation scripts do not depend on any other results of configuration. In contrast to the doctest module generation, the names of the generated modules is not known beforehand.

Dynamically generating arbitrary modules directly clashes into the multi-year-long discussion on whether or not cabal should support populating exposed-modules automatically. See for example #7016 and #5343.

This issue has clearly never been settled, but, if I understand correctly, the position from previous maintainers has always been that the list of exposed modules should be known statically and available from the Hackage index (i.e. from the cabal file).

The solution is simple though, do not do this with cabal. If you are preprocessing your source code in arbitrary way, use a general purpose code generator or a in-house one (like e.g. amazonka does). At that point you can generate the list of exposed modules in the cabal file as well.

Other module generation

ghc-paths is a library which records the build-time location of the ghc binary and it's libdir. It can be convenient to use ghc-paths when writing GHC API applications because it saves you having to provide the correct libdir. This is at the cost of making your application unsuitable for distribution (as the locations during build-time will almost certainly not exist on your users system at runtime).

What are the benefits of supporting this specific use-case? What is so hard about obtaining ghc's libdir?

Here is one example; hiedb-0.4.3.0 does the following:

import GHC.Paths (libdir, ghc)

runGhc args = do
  hc <- fromMaybe ghc <$> lookupEnv "HC"
  callProcess hc args

-- ...

runCommandTest = runCommand (LibDir libdir) testOpts

Why not pick (e.g.) ghc from the path and call ghc --info to get the libdir?

Program generators baked into Cabal

Support for serveral common program generators is built into Cabal. [...] This list of modules can be extended by augmenting hookedPreProcessors field of UserHooks.

I wish we could get rid of that hard-coded list and replace it with something declarative.

Code-generators declarative feature

code-generators is only supported in the testsuite stanza. Most program generation tasks are for the library stanza so this limits the utility.

With the above mentioned caveat that the list of generated modules has to be known, I don't see why this cannot be implemented.

For the doctest use-case, the generation script is passed a list of source directories for which it then has to traverse to find .hs files (which may or may not be part of the project).

It does not seem to be a big deal. The files have to be part of the package right?

As such, for more people to use this feature it seems a little more work in smoothing off these edges would be beneficial.

I don't understand. This statement along with the existence of this proposal seem to suggest you think that implementing a new build type is preferable over that little more work?

Paths* PackageInfo* modules

A basic amount of introspection can be performed with Paths_* and PackageInfo_* modules. It is possible that these modules could be extended to capture more information about a package, and then custom Setup.hs scripts could be removed (if they just needed the available information).

You are once mode admitting that these use-cases can be trivially supported by the simple build type.

Configure-style checks

Some packages implement configure-style checks in the Setup.hs script.

This might be the right point to mention a possible philosophical divergence.

Let me start with something trivial first: cabal provides a build system, Distribution.Simple. The users of a build system are developers, who use it build their applications.

I believe that managing the build environment is a fundamental responsibility of the developer and that cabal should refrain from this kind of auto-configurations.

Whoever is building the package (who is not necessarily the package author) can influence the build from a project level using the usual options like extra-lib-dirs and extra-include-dirs.

Right now there are shortcomings (see e.g. #2997) but they are consequence of the half-finished migration to the new parser, rather than something fundamental.

Agda

The Agda compiler is a standard Haskell executable. It can be built and installed with Cabal. Once the compiler is built, the base libraries need to be built by the Agda compiler so that people can compile Agda files which depend on them.

I believe this could be done at runtime (e.g. Agda's first run) and should not be done by Cabal.

I think developers should not be encouraged to use cabal install as an application distribution mechanism. There is a reason users apt install and developers do make install.

sheaf commented 1 year ago

Thanks for your comments @andreabedini, this is extremely helpful. I would like to respond with regards to what I consider to be the most important feature of custom setups: generating modules.

The solution is simple though, do not do this with cabal. If you are preprocessing your source code in arbitrary way, use a general purpose code generator or a in-house one (like e.g. amazonka does). At that point you can generate the list of exposed modules in the cabal file as well.

I don't understand how this can work. To take the example of the Haskell GI packages, when a user builds this package, the Setup script will determine which modules are to be generated based on the introspection data available on the user's system. With your suggestion, it would not be possible to upload such packages on hackage, which would be a significant blow to the distribution of the package.

Another point is that some packages generate modules based on information known to Cabal; for example stack retrieves transitive dependency information, so that users can query the stack executable to respond with the set of libraries it was built against.

I find the use cases demonstrated by these packages to be genuinely compelling, even though I understand they violate assumptions that are useful to make in Cabal (such as the list of modules being known ahead of time). Unless a plausible migration strategy is proposed for these packages, I don't see that we can tell users to switch to the simple build type.

hasufell commented 1 year ago

To take the example of the Haskell GI packages, when a user builds this package, the Setup script will determine which modules are to be generated based on the introspection data available on the user's system.

This is a bad hack and should burn.

The correct solution is to add compile-time warnings to those non-available functions and make them throw runtime error when used regardless.

This is what unix does too: https://github.com/haskell/unix/blob/71e56488132021d0b67970c70278c13e5b2c2e26/System/Posix/Files.hsc#L304-L310

dcoutts commented 1 year ago

Subcomponent targets were introduced in 2016 along with the new style project building (352f5795745a95d1d1dcaa941ee5507a7840401z6) and implemented in cabal-install. But as today they have not been implemented in Cabal. Leaving us with this:
-- Actually the reality is that no current version of Cabal's Setup.hs
-- build command actually support building specific files or modules.
setupHsSupportsSubComponentTargets = False```

Interestingly, the problem of them not being implemented in Cabal is an example of the problem that this proposal is designed to address. The reason they were never implemented in Cabal is that it involves a cross-cutting feature being added to the build system of every package, since it is a change in the Setup.hs CLI. This is the kind of thing that becomes much easier to do when the build system for all packages is provided by the builder not the package.

Ericson2314 commented 1 year ago

@hasufell Huh? making compile-time errors into runtime errors sound very bad. I hate how the JS backend does this (though I understand it was the quickest way to get things going back in the GHCJS days), I don't want to see more things do it.

Ericson2314 commented 1 year ago

The custom escape hatch I would like to see would be something like spitting out a ninja file: in particular I don't want a custom executable to do anything; I want it to spit out of a build plan of what needs to be done. All execution of that plan should be done by Cabal.

Custom scripts that do their own IO are a huge pain to downstream buildsystems, forcing things to go through a declarative interface is far, far, better.

hsenag commented 1 year ago

Not sure where the best place to discuss it is, but just want to get Darcs on the radar as I haven't seen it listed in the survey. It's a little while since the last time I looked at getting rid of it but there's some discussion here and the current Setup.hs is here.

mpickering commented 1 year ago

@Ericson2314 Can you clarify your point about how custom scripts are problematic to downstream build systems? At the moment nixpkgs simply compiles the ./Setup.hs script and invokes that in order to build a package, it doesn't seem that that causes any complication.

mpickering commented 1 year ago

@andreabedini

Thank you for bringing attention to these other important issues. I am sure we can make progress on some of them as part of the maintenance side of this project. Is there one of them in particular which you think would deserve closest attention?

Thank you for your thoughtful reply, I will try and explain a bit more closely what our reasoning for working on this is.

It seems there is a fundamental disagreement about whether the build process should be possible to be augmented or not. We are certainly of the opinion that the build process needs to be extensible.

Reason 1

It is generally the case that large and old projects don't add support for features unless there is a "compelling reason" and the compelling reason in this case is that users have to augment the build system in order to implement something which is missing. If there is no way to augment the build system then either a user gets completely stuck, or must embark on a potentially very long and arduous process of implementing a feature in cabal which supports their specific use-case.

Reason 2

Regarding the code-generators feature, our opinion is that this is much better achieved using a Setup.hs script.

Both involve executing an executable in order to achieve something.
code-generators provides an ad-hoc amount of information about the project to the generator, passed in an unstructured way via the command-line. What if a user requires more information about the build in order to do their code generation?
UserHooks or `SetupHooks`` simply provide a nice unifying way of implementing this, along with other modifications to the system.
Suppose you don't wish to use the build system in Cabal to build the package, you need to implement specific logic for the code-generators feature (and any subsequent ones), which involve building an executable and invoking it at a certain time.

It seems following this to its logical conclusion, we would end up with `a section in the cabal file for each of the hook points we identified in the proposal and calling a different executable in each phase.

Configure Checks

` Regarding configure style checks, I am not sure what your proposal is to migrate away from these checks. It is not simply about ensuring that the developer system has a certain system package, but being able to write packages which work depending on the version and implementation of the system package. As Julian points out, the unix package has configure checks to decide which syscalls your build platform supports and provides different implementations based on this. What is available on standard libraries on different platforms and you have to be able to write libraries which work in those situations.

Complicated Setup.hs

Regarding Agda, the situation is that currently the Agda install and development process relies on this behavior. Passing the responsibility onto the Agda developers to design a system where packages are compiled on the first run of the program seems to assume a lot about the development experience they want for their users.

It is not only Agda though, but also Idris and Darcs which contain specific logic in this form. ghc also has custom Setup.hs scripts which perform code generation based on configuration. It doesn't seem like an option to me to tell these important users of Cabal that they can no longer interact in the same way without providing a compelling alternative.

It has certainly been proposed multiple times that projects like ghc should "just" be built with `cabal-install`` but unless it is possible to augment the build process I don't see that being a possibility.

Future Work

The real motivation for working on this is to be able to improve cabal-install in the manner described in the proposal. The part which particularly motivates me is to implement cross-package parallel builds similar to the way I have already implemented in hadrian (and rules_haskell also implements). I can see that this would have many benefits for users but we are blocked on getting there by build-type: Custom

An alternative would be to only implement these cross-cutting features only when the build plan contained Simple packages. In fact, it would have been much easier to do that as it wouldn't have required us to embark on this redesign process. However, it also seems that maintaining two code paths in cabal-install indefinitely wouldn't have been a good way to a maintainable software project. Therefore we embarked on this more fundamental redesign of Custom build-type first which would unlock this other work in future.

Ericson2314 commented 1 year ago

@mpickering

@Ericson2314 Can you clarify your point about how custom scripts are problematic to downstream build systems? At the moment nixpkgs simply compiles the ./Setup.hs script and invokes that in order to build a package, it doesn't seem that that causes any complication.

Nixpkgs simply compilers ./Setup.hs because that is the lowest-common denominator with custom setups. I would rather not do things that way with Nix, and with custom setups banishes we have an opportunity.

What I really want to do is per module builds, and similarly fine-grained derivations. ghc --make vs one-shot mode isn't really a package-level distinction, but other things are. For example --- your work for configuring before building! I would very much like to have separate Nix derivations to configure all packages in parallel prior to building any of them.

So yeah for a flavor of the fine-grained single executor mode, imagine this:

Version solver figures out graph with node = package
Cabal per-package logic expands those package nodes into something more fine-grained, e.g. configure vs build vs install etc. or whatever
ghc -M or something new splits a single build ghc --make node into per-package nodes.

All these tasks are working at different layers of abstraction to together build a really fine-grained build plan. This is exactly what Nix or Ninja or Bazel wants to do kick-ass incremental builds across an entire project (which might involve non-Haskell too.)

I want the custom setup replacements to dovetail, not fight, with that vision, so perhaps it is "hooks" but it is a way to hook into the planning process, adding/splitting new nodes. E.g. if I have some custom-generated .hs file, I want that to hook in with ghc -M, and maybe even GHCi (imagine a more declarative replacement to addDependentFile).

To design this sort of thing, I suppose I would look at CMake and Meson. They are both domain-specific build systems that can use a domain-agnostic backend (e.g. Ninja) as a backend. They show we can have plenty of flexibility, while still separating building from execution.

Meson trying to integrate with Cargo/crates.io is especially interesting prior art, because that is basically the same task as this ticket but from the other side. I.e. trying to get an intra-package tool to work better with an inter-package tool, as opposed to trying to get an inter-package tool and give it more flexibility for custom intra-package logic.

andreabedini commented 1 year ago

@sheaf

Thank you! I admit I have learned heaps just from the research needed to write those comments 😂. By the way, did you work on this proposal too? I notice what it is not written anywhere who is making the proposal.

I don't understand how this can work. To take the example of the Haskell GI packages, when a user builds this package, the Setup script will determine which modules are to be generated based on the introspection data available on the user's system. With your suggestion, it would not be possible to upload such packages on hackage, which would be a significant blow to the distribution of the package.

Let me be clear: I am not claiming that build-time code-generation is never necessary or a good idea. My argument on code-generation is that we can support a reasonable amount of use-cases with the simple build-type (or improving/extending it).

Back to Haskell GI. There are two points to the discussion, which I believe are separate.

1) The dynamic generation of the list of exported modules. As I have mentioned above, this is contrary to a long standing position of previous maintainers that exported modules should be statically listed in the cabal file. There is an open discussion about this, I admit I don't have a clear view of the issue but we cannot ignore it. Haskell-GI packages already subvert this policy by including a bogus list of exposed-modules (I am going to say this list is likely to be often correct right after this 🙃). From gi-glib.cabal:

      -- Note that the following list of exposed modules and autogen
      -- modules is for documentation purposes only, so that some
      -- documentation appears in hackage. The actual list of modules
      -- to be built will be built at configure time, based on the
      -- available introspection data.

      exposed-modules: GI.GLib.Config,

2) I admit I don't have experience with gobject-introspection. From what I could understand during the weekend, the metadata that haskell-gi uses for code-generation is generated from the C libraries source-code. The GNOME Developer Documentation says:

GObject introspection is a system which extracts APIs from C code and produces binary type libraries which can be used by non-C language bindings, and other tools, to introspect or wrap the original C libraries. It uses a system of annotations in documentation comments in the C code to expose extra information about the APIs which is not machine readable from the code itself.

From this I understand the produced .gir files are conceptually similar to header files. My thinking is that /usr/share/gir-1.0/cairo-1.0.gir does not include any "runtime" information: it is the same for every installation of cairo since it is generated for its source-code alone, and you can then use it to pre-generate the bindings. Another data point is that gobject-introspection is designed to also be usable directly at runtime so maybe code-generation is not even needed. Of course, I might have misunderstood how this works so, please correct me if this is the case.

Another point is that some packages generate modules based on information known to Cabal; for example stack retrieves transitive dependency information, so that users can query the stack executable to respond with the set of libraries it was built against.

This kind of information is what is (or can be) provided by cabal auto-generated modules, i.e. Paths_* and PackageInfo_*. It is mentioned in both the survey and my comments above.

The survey also concludes:

It is possible that these modules could be extended to capture more information about a package, and then custom Setup.hs scripts could be removed (if they just needed the available information).

Lastly:

Unless a plausible migration strategy is proposed for these packages, I don't see that we can tell users to switch to the simple build type.

I agree. I don't want to force anybody off the custom-setup until we have an alternative. My claim is that we are close (as the survey confirms) to makingn the simple build-type work for most of the use-cases and we should not invest into something new that represents a step back from its declarative approach.

andreabedini commented 1 year ago

@mpickering

It seems there is a fundamental disagreement about whether the build process should be possible to be augmented or not. [...] If there is no way to augment the build system then either a user gets completely stuck, or must embark on a potentially very long and arduous process of implementing a feature in cabal which supports their specific use-case.

This is more of a philosophical point, rather that a discussion of proposal itself. From my point of view, where cabal's build-system is lacking, I would expect cabal to integrate well with any other build-system. I would much prefer a "simple and to-the-point cabal" than a "complex and with an escape hatch" cabal. I am not claiming the hooks build-type will make cabal complex, in fact, I think it already is!

Regarding the code-generators feature, our opinion is that this is much better achieved using a Setup.hs script. Both involve executing an executable in order to achieve something.

Note, I never claimed that running an executable during the build process should be avoided, I would not expect to be able to build anything otherwise! 🙃

code-generators provides an ad-hoc amount of information about the project to the generator, passed in an unstructured way via the command-line. What if a user requires more information about the build in order to do their code generation?

Can that existing design be extended to provide the extra information? Do you have a specific use-case in mind that won't be achievable by extending the existing code-generator feature? To implement a new feature to replace an existing one, you need to that argue that the existing feature cannot be improved or extended.

Suppose you don't wish to use the build system in Cabal to build the package, you need to implement specific logic for the code-generators feature (and any subsequent ones), which involve building an executable and invoking it at a certain time.

If the semantic of the cabal file is clearly specified, why would that be an issue? I am not sure I understand the context though. If you don't wish to use the build-system in Cabal, ... then use another one? Maybe you mean someone wants to use cabal-install but not the build-system in Cabal?

It seems following this to its logical conclusion, we would end up with `a section in the cabal file for each of the hook points we identified in the proposal and calling a different executable in each phase.

I don't see how this follows. You seem to assume that all the hooks included in the proposal are desired but that is the source of our disagreement.

Regarding configure style checks, I am not sure what your proposal is to migrate away from these checks. It is not simply about ensuring that the developer system has a certain system package, but being able to write packages which work depending on the version and implementation of the system package. As Julian points out, the unix package has configure checks to decide which syscalls your build platform supports and provides different implementations based on this. What is available on standard libraries on different platforms and you have to be able to write libraries which work in those situations.

I don't claim to have solution for everything. Can we determine those syscalls from the platform (which is statically known)? Is there any statically known piece of information we can provide to address these problems?

I don't claim there are no problems to solve, but that we continue moving towards a declarative specification.

(While I am looking at the unix package, note how it does code generation outside of cabal: the configure script has to be generated with autoreconf before cabal sdist)

@hasufell @Ericson2314 re: compile-time warnings vs run-time failures; a better pattern is already used in base:

executablePath :: Maybe (IO (Maybe FilePath))

Regarding Agda, the situation is that currently the Agda install and development process relies on this behavior. Passing the responsibility onto the Agda developers to design a system where packages are compiled on the first run of the program seems to assume a lot about the development experience they want for their users.

From the proposal:

In the short term, the important parts of this project include:

Engaging with package maintainers to remove the requirement for the custom build-type where better alternatives exist.

I am keen to understand their experience and expectations.

It is not only Agda though, but also Idris and Darcs which contain specific logic in this form. ghc also has custom Setup.hs scripts which perform code generation based on configuration. It doesn't seem like an option to me to tell these important users of Cabal that they can no longer interact in the same way without providing a compelling alternative.

I haven't had the time to look at Idris, darcs or ghc; but who says they cannot longer do that? I haven't proposed to deprecate the custom build-type (yet 😛).

The real motivation for working on this is to be able to improve cabal-install in the manner described in the proposal.

Assuming you are referring to the "Future work" part, I apologies but I do not understand what you mean there. As commented above, I do not know how the proposal is meant to improve cabal-install. AFAIU there would be minimal changes to cabal-install (maybe only in SetupWrapper?).

The part which particularly motivates me is to implement cross-package parallel builds similar to the way I have already implemented in hadrian (and rules_haskell also implements).

Sorry, you have to break this down for me. Do you mean per-module builds? Do hadrian or rules_haskell implement per-module builds on packages that use the custom build-type?

An alternative would be to only implement these cross-cutting features only when the build plan contained Simple packages.

By the way, @dcoutts and you have been using the term "cross-cutting" but I don't know what you mean by it.

In fact, it would have been much easier to do that as it wouldn't have required us to embark on this redesign process. However, it also seems that maintaining two code paths in cabal-install indefinitely wouldn't have been a good way to a maintainable software project. Therefore we embarked on this more fundamental redesign of Custom build-type first which would unlock this other work in future.

If this is the feature you want to build, let's talk about this instead!

How would per-module building work for each of the existing build-types?
What would be the issues with e.g. custom or configure build-types?
Which parts of the code would have to change?

(I am just guessing here, trying to understand your point-of-view.) Maybe the problem, not mentioned in this discussion so far, is that the simple build-type is actually implemented on top of the custom build-type. In the sense that

defaultMain = getArgs >>= defaultMainHelper simpleUserHooks

I see how this breaks any hope of radically changing its implementation (the possible improvements I mentioned above are still valid though, they are not radical changes). Redesigning how this works could be a good feat. From cabal-install pov it could be simpler, change SetupWrapper to not just call Simple.defaultMainArgs.

I can see that this would have many benefits for users but we are blocked on getting there by build-type: Custom

The way I see it, it's the packages using build-type Custom being stuck (with all due respect to the maintainers of those packages, as you see above, I am trying to help).

andreabedini commented 1 year ago

Just another thought (I think the very last one I will be able to express today :joy:).

Let's remember that under no circumstances the exposed API of a package should depend on the build time configuration. If the exposed API cannot be determined statically, PVP is meaningless.

Ericson2314 commented 1 year ago

The Maybe trick isn't ideal because then no code gets a static guarantee.

Let's remember that under no circumstances the exposed API of a package should depend on the build time configuration. If the exposed API cannot be determined statically, PVP is meaningless.

I do agree with in principle, but that means that any Unix-only or Windows-only functionality really ought to go in another package. My original standard-library overall proposal was supposed to move us in that direction, but base as it currently stands can't well support the browser while not having gratuitous runtime errors or variation and be in compliance with it.

Cargo's features is an interesting way to allow the interface of a package to vary in a way that the solver can be aware of, so it is kosher. We could consider that if people were to complain having more packages is too annoying.

andreabedini commented 1 year ago

@Ericson2314 I think this is leading us off topic, I replied privately :)

andreabedini commented 1 year ago

I thought more about this and I came up with an idea to move this forward and also unblock us from reworking the implementation of the Simple build-type.

I mentioned above that the Simple build-type currently shares its implementation with the "Custom build-type" by providing a pre-defined set of hooks (simpleUserHooks) to defaultMainWithHooks. I use quotes around "Custom build-type" because I think there are two aspects we ought to keep separate.

⚠️ Wall of text ahead.

What are build-types now

Following the original Cabal proposal, the interface of a Cabal package is composed of a package description file and a ./Setup.hs script. The Cabal library (then called "the simple build infrastructure") provides a API to implement ./Setup.hs. It does so by providing pre-defined implementations of ./Setup.hs for four build-types. Each build-type is effectively a different build-system.

Three of those build-types (Simple, Configure and Make) are completely determined, in the sense that the is no user provided code. All the information they need is included in the package description (I used the term declarative before but I am not sure I would use that again to describe the Make build-type). In these three cases, the package author can write a ./Setup.hs that calls into a main function provided by Cabal and the package builder can even opt to use a pre-compiled version of the same script obtained separately (which e.g. is what cabal-install does with act-as-setup). A fourth build-type allows the user to customise part of it by passing UserHooks; in this case the resulting ./Setup.hs is always custom giving name to the build-type itself. Note that a custom ./Setup.hs type uses the interface of Distribution.Simple :smile:

Not a new build-type

In all cases, the interface to the package builder is always the setup script, either "stock" or separate for each package. To the package builder, "Custom" means "I have to compile ./Setup.hs" and does not have anything to do with defaultMainWithHooks.

From this point of view, the proposal here is not to offer an alternative to the Custom build-type but to defaultMainWithHooks. SetupHook.hs is just Setup.hs and the package author would call defaultMainWithSetupHooks rather than defaultMainWithHooks. This leads me to think that the best way forward is embrace this modularity and reflect it in the code base as much as possible.

We don't need a new build-type. The one proposed here is a new user-extensible build-system just like defaultMainWithHooks. They both fit in build-type: Custom because the package builder will, in both cases, have to build ./Setup.hs.
This new build-system could be offered as a separate package that package authors can add in their setup-depends.

@mpickering I think this can be done today with minor changes to your WIP branch.

Some ideas around extensible build-types

Separately from the proposal: there is no reason why the simple build-type has to be implemented as defaultMainWithHooks simpleUserHooks, which @mpickering correctly identifies as blocking innovations. We can start addressing this by making Cabal's code reflect this modularity.

The Simple, Make and Configure build-types could be defined by separate packages (or maybe just separate components) and evolve independently.
The bulk of Cabal is either move to a defaultMainWithHooks-build-system-library (for lack of better name) or stay in Cabal itself as provider of an underlying infrastructure.
If anybody wants to rework how the simple build-type is implemented, they could start now by implementing it as as a separate build-system to call with build-type: Custom (I assume everybody's head is spinning by now).
We could think of some way to turn the build-type: field into something that can be used with such 3rd party provided build-systems.

I find amusing how I seemingly have moved from "custom setups are bad" to "everything is a custom setup" 😆. Nevertheless I don't think I need to take back anything I have written above. I still believe that the simple build-type can address most of the use-cases presented in the survey. That we always rely on ./Setup.hs is a fact that we cannot change. The impact on the codebase is zero since it can be implemented separately; even better the design above could improve the codebase modularity.

eli-schwartz commented 1 year ago

I don't claim to have solution for everything. Can we determine those syscalls from the platform (which is statically known)? Is there any statically known piece of information we can provide to address these problems?

No, because this can and does change due to minor version updates of the platform itself -- usually adding new syscalls, not deleting old ones ;) but nonetheless...

So you'd at a minimum have to determine those syscalls from not just the platform, but also the version of the platform. On Linux this might mean knowing all of: the currently booted kernel (as opposed to the kernel being booted tomorrow, which may be either older or newer), the glibc version, the linux API headers version...

What's the actual downside of doing configure checks?

andreabedini commented 1 year ago

What's the actual downside of doing configure checks?

Surely they can make the build process hard to reproduce. but I am not sure about whether they are_bad in general.

This what I think it happened. When I wrote

I believe that managing the build environment is a fundamental responsibility of the developer and that cabal should refrain from this kind of auto-configurations.

I had in mind situations where ./Setup.hs adds extra logic to find system dependencies (hence the reference to extra-lib-dirs right below). But the term "configure-style checks" is quite vague so the discussion moved to other kinds of "checks" and I lost track of what I was originally saying :man_shrugging:

mpickering commented 1 year ago

@Ericson2314

All of what you describe in your comment sounds great, and is exactly what we want to achieve in the second-phase of this proposal. The Hooks build-type is a necessary part of the process in order to allow augmentation of the build to fit into the framework you describe.

mpickering commented 1 year ago

@andreabedini

Thank you! I admit I have learned heaps just from the research needed to write those comments 😂. By the way, did you work on this proposal too? I notice what it is not written anywhere who is making the proposal.

Yes the proposal is authored by myself, @sheaf with help from @dcoutts and @adamgundry.

This kind of information is what is (or can be) provided by cabal auto-generated modules, i.e. Paths* and PackageInfo*. It is mentioned in both the survey and my comments above.

But it currently isn't, so what does a user do when they need information which is not in Paths or PackageInfo?

it is the same for every installation of cairo since it is generated for its source-code alone

It is the same assuming that the installation of cairo is exactly the same on each users system, which may not be true. THe approach taken by these packages is robust against slight changes between distributions and future updates. Again, what does a user do if their package distributor prepares a patch against cairo or distributes a different version to the one the bindings are generated against on hackage.

Let's remember that under no circumstances the exposed API of a package should depend on the build time configuration. If the exposed API cannot be determined statically, PVP is meaningless.

For a package like unix, which has a consistent API across platforms, on backends such as wasm many of the definitions are defined in terms of error and given "WARNING" pragmas about not being defined. Precisely which definitions is determined by configure type checks. Arguably this is much worse that just not providing those functions on the wasm backend at all..

The fact is that people are relying on this behaviour in order to write robust bindings against C/system libraries so again, we do not feel in a position to dictate to the library authors what they should/shouldn't be doing and have instead taken the pragmatic position to design an alternative which allows people's program to continue to work whilst allowing us to achieve our longer term goals.

Can that existing design be extended to provide the extra information? Do you have a specific use-case in mind that won't be achievable by extending the existing code-generator feature? To implement a new feature to replace an existing one, you need to that argue that the existing feature cannot be improved or extended.

Code generators are not passed any information about the configuration so can't do any program generation based on that. The extra information which you can pass is described in the proposal as the interface to the configure hooks. So you can view the Hooks build type as a natural extension of the code generation feature.. which also allows augmentation of different parts of the build system rather than just one specific task.

I feel like the discussion about code-generators is missing the point somewhat as there are no packages to my knowledge which use this feature.

If the semantic of the cabal file is clearly specified, why would that be an issue? I am not sure I understand the context though. If you don't wish to use the build-system in Cabal, ... then use another one? Maybe you mean someone wants to use cabal-install but not the build-system in Cabal?

It's not an issue, but my point is that you have to do this for code-generators and also any subsequent extensions to the cabal file to subsume. To me it seems much cleaner if there is a general mechanism (Hooks) which has to implementing once to cover many/any use case rather than implementing extensions separately.

The semantics for the Hooks build-type are also clearly specified, and it is probably clearer for existing packagers to understand at which points each hooks should run rather than a cabal-specific notion of a code-generator because the hooks follow a ./configure/build workflow pattern.

I don't claim to have solution for everything. Can we determine those syscalls from the platform (which is statically known)? Is there any statically known piece of information we can provide to address these problems?

You can't determine statically which syscalls or configuration that every possible system supports. There are still platforms to be designed which will be supported by existing configure checks.

I haven't had the time to look at Idris, darcs or ghc; but who says they cannot longer do that? I haven't proposed to deprecate the custom build-type (yet 😛).

Of course they can continue to use Custom Setup.hs, and we can carry on doing so for the rest of time if we don't want to make any progress in improving cabal-install. That seems a very sad position to take though.

Assuming you are referring to the "Future work" part, I apologies but I do not understand what you mean there. As commented above, I do not know how the proposal is meant to improve cabal-install. AFAIU there would be minimal changes to cabal-install (maybe only in SetupWrapper?).

Phase 1 (this proposal) does not perform significant modifications to cabal-install, but it is necessary to be able to perform them in future. We are intending to apply for more funding when/if the first phase is completed.

Sorry, you have to break this down for me. Do you mean per-module builds? Do hadrian or rules_haskell implement per-module builds on packages that use the custom build-type?

No because that's impossible given the existence of Custom Setup.hs. hadrian doesn't support building packages with Custom Setup.hs scripts.

Currently the build graph is on a per-component/package level where there are nodes in the build graph are either components or packages. Hadrian has a per-module build graph, where the nodes are modules and this allows a much greater amount of parallelism.

For example, implementing something like -jsem only really works because Custom Setup.hs scripts are straightforward.. a Custom Setup.hs could just completely ignore the -jsem flag when it was passed to the build command. It just happens to work because all the Custom Setup.hs scripts are actually much simpler and implemnted in terms of the normal Cabal library.

By the way, @dcoutts and you have been using the term "cross-cutting" but I don't know what you mean by it.

A "cross-cutting" feature is one which requires certain interaction between packages in the build plan (rather than just treating each package build as a black-box). For example, if you want to create a per-module build plan then you need to look inside each package to determine the dependencies rather than treating each package as a black box.

If this is the feature you want to build, let's talk about this instead!

This is the feature we want to build but it is getting off-topic from this proposal, which is a worthwhile thing to pursue in any case.

mpickering commented 1 year ago

@andreabedini

I find amusing how I seemingly have moved from "custom setups are bad" to "everything is a custom setup"

It seems that you are just proposing what is already the case for Custom setup scripts! This is exactly what we want to move away from.

Not being able to declaratively state that your build-type is "Hooks" or "Simple" would lead to any build system having to pessimise its assumptions about the structure of your project and would lead us back to having to do builds per-package.

If anybody wants to rework how the simple build-type is implemented.

I think you are misunderstanding the ultimate goal here, which is to not go via the Cabal Setup.hs interface when building Simple/Configure/Hooks build-type packages. We want to instead refactor the Cabal library so the relevant functions can be called directly from cabal-install without the indirection of going via the ./Setup.hs build interface.

Ericson2314 commented 1 year ago

We want to instead refactor the Cabal library so the relevant functions can be called directly from cabal-install without the indirection of going via the ./Setup.hs build interface.

I was talking to @andreabedini a bit and this the phrasing I have which is trying to get to end purposes:

Package-specific code to customize the build (be it Setup.hs, hooks described in Haskell, hooks described in some other language, etc.) should not need to depend on code implementing the regular build process. That violates "a pay for what you use" principle --- to change just a few things you take on the burden of re-implementing everything else. While today we ameliorate the "re-implementation" burden by allowing Setup.hs authors to reuse the same code that cabal-install uses, this is not satisfactory solution either since it makes for a much greater stable API surface needed for Setup.hs.

In other words, it isn't good enough to consider just the code in Setup.hs, we also want to consider the entire closure --- all the code that goes into the final ./Setup (or the runtime heap when interpreted) --- and make sure that doesn't overlaps as little as possible with cabal-install.

eli-schwartz commented 1 year ago

What's the actual downside of doing configure checks?

Surely they can make the build process hard to reproduce. but I am not sure about whether they are_bad in general.

This what I think it happened. When I wrote

I believe that managing the build environment is a fundamental responsibility of the developer and that cabal should refrain from this kind of auto-configurations.

I had in mind situations where ./Setup.hs adds extra logic to find system dependencies (hence the reference to extra-lib-dirs right below). But the term "configure-style checks" is quite vague so the discussion moved to other kinds of "checks" and I lost track of what I was originally saying :man_shrugging:

There are upsides and downsides. If you give people the power to detect intrinsic platform interfaces, you give them the power to misuse it and detect optional features too.

What we do for meson is provide a non-turing-complete DSL for build system configuration. This includes methods for compiler checks but those compiler checks include "required" annotations that are plumbed into build options:

dependency('openssl', required: get_option('ssl')

will attempt to probe your system for openssl libraries, using your own provided inputs for lib-dirs /pkg-config dirs or falling back to the canonical system locations. But it's dependent on whether you pass a command line option to the equivalent of Setup.hs: -Dssl={enabled|disabled|auto}.

We cannot force people to use the incredibly simple helpers we provide (not without restricting valid use cases too) but we can make it clear that if you decline to use this preferred style you are a "bad citizen" and making it hard for users.

This is not a problem unique to originating in Haskell, it's a problem that has been defined and described by many different groups. A good generic overview is https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Automagic_dependencies

Ericson2314 commented 1 year ago

@mpickering I am glad we agree on the destination! But just to double check I do think it informs the design of phase 1 the hooks themselves.

In particular an IO () hook, be it a literal haskell IO () action, a shell snippet, or something else, is no good because we don't what it produces and what it depends on.

Conversely, hook that looks like a make rule, it might run an arbitrary action but it must produce a given file and it can only depend on other hooks (and "built-in rules" provided by cabal-install is much better).

Hooks for intermediate targets (not installed) can be more free-form, but hooks for final targets (are installed) should strictly correspond to files declared in the cabal file --- if that interface is not expressive enough to declare what is installed, it should be fixed.

(Perhaps we will no longer need autogen stanzas as the mere presence of a hook producing that file indicates that it is autogenerated.)

Ericson2314 commented 1 year ago

FWIW I asked @eli-schwartz to poke his head in there because Meson has a lot of experience trying to wrangle this stuff from the other side (polyglot projects with language-specific-package-repo dependencies). There is a lot we can learn here.

Meson, like CMake, combines "configuring" with "producing the makefile". From Cabal perspective we can think of this like Setup-Type: Custom + *.buildinfo files if:

The custom setup was written a more restricted DLS that is portable to windows
The *.buildinfo file included the new hooks we are talking about (in the most general form, arbitrary ninja rules for Meson)

The fact that the DSL is limited also blurs the distinction between configuring and conditional stanzas in the cabal file itself.

Indeed, with @mpickering's previous work allowing "configure in parallel", we could imagine running doing configuring during solving. This would probably be horrendously slow, but it is an interesting thought experiment. In particular, to not screw over cross compilation is important that any IO "coeffects" (probing the world) during configuration can be replaced with consulting a fed-in dictionary of answers (c.f. @mpickering work of saying to Cabal "trust me, the ABI hash if your dep is going to be this, don't mind that it isn't built yet --- same principle of making things pure). In the "detect nothing, used fed-in information case" the entire configure step becomes a pure function that code be expressed in today's cabal file langauge, and then "do configuring as part of solving" sounds a lot less unreasonable.

mpickering commented 1 year ago

@mpickering I am glad we agree on the destination! But just to double check I do think it informs the design of phase 1 the hooks themselves.

In particular an IO () hook, be it a literal haskell IO () action, a shell snippet, or something else, is no good because we don't what it produces and what it depends on.

Conversely, hook that looks like a make rule, it might run an arbitrary action but it must produce a given file and it can only depend on other hooks (and "built-in rules" provided by cabal-install is much better).

Hooks for intermediate targets (not installed) can be more free-form, but hooks for final targets (are installed) should strictly correspond to files declared in the cabal file --- if that interface is not expressive enough to declare what is installed, it should be fixed.

(Perhaps we will no longer need autogen stanzas as the mere presence of a hook producing that file indicates that it is autogenerated.)

I don't think I understand this comment, could you please suggest how you would modify the proposal in order to achieve this whilst also supporting all the identified use cases?

If I think about nixpkgs, you don't know what the build step there produces nor what it explicitly depends on (apart from the previous phases) so I don't see how this is much different.

haskell / cabal

Design a 'Hooks' build type to replace 'Custom' #9292

7350

7877

6112

6232

5933

3600

2910

Design

Hooks build type

Motivation

Issues with `UserHooks`

Future work

Stackage Setup.hs survey

Doctests

haskell-gi packages

Other module generation

Program generators baked into Cabal

Code-generators declarative feature

Paths* PackageInfo* modules

Configure-style checks

Agda

Reason 1

Reason 2

Configure Checks

Complicated Setup.hs

Future Work

What are build-types now

Not a new build-type

Some ideas around extensible build-types

haskell / cabal

Design a 'Hooks' build type to replace 'Custom' #9292

7350

7877

6112

6232

5933

3600

2910

Design

Hooks build type

Motivation

Issues with UserHooks

Future work

Stackage Setup.hs survey

Doctests

haskell-gi packages

Other module generation

Program generators baked into Cabal

Code-generators declarative feature

Paths* PackageInfo* modules

Configure-style checks

Agda

Reason 1

Reason 2

Configure Checks

Complicated Setup.hs

Future Work

What are build-types now

Not a new build-type

Some ideas around extensible build-types

Issues with `UserHooks`