haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.63k stars 696 forks source link

Per-component dependency solving #4087

Open grayjay opened 8 years ago

grayjay commented 8 years ago

I don't think there is a clear design for component-based solving yet, so I wanted to create an issue where we can discuss it. #1575 addresses circular dependencies in test suites but is moving towards a different solution.

Related issues:

779, #3978, https://github.com/haskell/cabal/issues/3492#issuecomment-225391004 - solver doesn't reject configurations that require unbuildable/non-existent components

1575 - cabal can't handle cycles between packages that are not cycles between components

2725, #5413 - cabal requires dependencies of components that aren't being built

3662 - more per-component support

3263 - fine-grained dependencies

grayjay commented 8 years ago

Here are my thoughts, so far. At a minimum, the solver should keep track of dependencies between packages' components. (I think that it currently tracks dependencies from components to packages.) Then it can stop requiring dependencies of components that aren't needed, and ensure that required components are buildable. The solver should still ensure that all components within a package are consistent (have the same version and flag assignments), unless they have different qualifiers. Maybe we can add a flag or use private dependencies to relax that restriction later.

ezyang commented 8 years ago

@grayjay, thanks for making this ticket. Your plan seems eminently reasonable. Go for it!

grayjay commented 7 years ago

Here is a first pass at a design:

Summary

Tree structure

Index conversion

executable my-exe
  if flag(my-exe-component-on)
    x, with "Buildable: False" fields converted to failures
  else
    if flag(old-setup-script-features)
      x
    else
      x, transformed as in #3039 (in order to ignore dependencies when
      the component is unbuildable) and all dependencies converted to
      qualified constraints
test-suite my-test
  if flag(my-test-component-on)
    x, with "Buildable: False" fields converted to failures
  else
    if flag(tests-enabled)
      if flag(old-setup-script-features)
        x
      else
        x, transformed as in #3039 (in order to ignore dependencies when
        the component is unbuildable) and all dependencies converted to
        qualified constraints
    else
      empty test-suite stanza

Dependency graph

Custom setup validation steps

Component dependency validation steps

Enabling tests/benchmarks

Internal dependencies

More questions

/cc @kosmikus @ezyang @Ericson2314 @dcoutts @edsko @hvr @23Skidoo

ezyang commented 7 years ago

Thanks, this looks great.

How do we know when a setup script supports per-component builds? Do we only need the setup Cabal version and build type?

Per-component occurs if and only if:

The canonical source of truth is why_not_per_component in ProjectPlanning. Probably best to just cross-reference two sites for now.

Constraints from one component should still apply to others. Otherwise, we might break packages that only constrain dependencies for one component (usually the library).

But note that to fix things like #3732 we'll have to relax how we do the constraints here. I guess this is covered by what you say later: "If we ever need to build components with conflicting dependencies, we can use qualified goals."

How do we handle component goals for installed packages? Do we know which executables are available? Can we find all components by package name and version in the installed package index?

At the moment, we won't know anything about executables, but we don't have to, because new-build will never have those as "installed" packages.

Internal libraries pose a special problem, because they show up in the installed package index (I recently committed a hack to handle this case). With our current featureset, we are allowed to assume that any internal library in the installed package index can only be reached by a corresponding "public" library from the package (so the "correct" way to handle internal libraries is to bundle them all up together with the public library into one 'package node'). The hacked code doesn't actually do this but that's what should be done morally.

What if a user specifies an unbuildable component as a build target? The solver could try to build the component by flipping flags, but then the install plan would depend on the list of targets.

Perhaps there should be a distinction made between "build targets" (what am I going to build this round) and "solver targets" (please work hard to dep solve so that X is buildable.) The latter is stable and shouldn't change even if you decide to build different things. This should solve the next bullet point too.

grayjay commented 7 years ago

Thanks for the feedback!

Per-component occurs if and only if:

We are new-build build-type is not Custom cabal-version >= 1.8 --disable-per-component is not specified

I see that I had this part backwards. These rules simplify things, because the solver doesn't need to choose whether to use per-component build for each package; there is only one choice. In that case, I think that we don't need to add the "setup script features" variables, for now.

I had also incorrectly assumed that every package that ignores dependencies of unbuildable components also supports per-component build. Since the two attributes are not linked, I think we should handle #3881 separately.

But note that to fix things like #3732 we'll have to relax how we do the constraints here. I guess this is > covered by what you say later: "If we ever need to build components with conflicting dependencies, we can use qualified goals."

I'm not sure that we should make cabal find a solution for the example at the top of #3732, which only constrains base in one component. It looks the same as a package where the author meant for the constraint on base to apply to all components but wanted to avoid duplication. I haven't searched Hackage to see how common that pattern is, though.

Per-component solving would still allow more flexibility in less extreme cases, even if it always applied all build-depends constraints. Take this example:

executable exe1
  build-depends: x > 2

executable exe2
  build-depends: y < 3

Say that x > 2 and y < 3 have conflicting constraints on z. Then per-component solving would allow exe1 and exe2 to be installed separately while the current solver wouldn't find a solution.

Qualified goals would only help by allowing components that were independently solvable to be solvable together with different qualifiers (possibly using different versions of the package containing the components).

At the moment, we won't know anything about executables, but we don't have to, because new-build will never have those as "installed" packages.

Okay, so cabal will behave as if those package instances provided libraries but not executables.

Internal libraries pose a special problem, because they show up in the installed package index (I recently committed a hack to handle this case). With our current featureset, we are allowed to assume that any internal library in the installed package index can only be reached by a corresponding "public" library from the package (so the "correct" way to handle internal libraries is to bundle them all up together with the public library into one 'package node'). The hacked code doesn't actually do this but that's what should be done morally.

I'm not sure I understand this part. Can we always reach all of the internal libraries if we have the main library from the installed package index? I think that is all the solver needs to do.

Perhaps there should be a distinction made between "build targets" (what am I going to build this round) and "solver targets" (please work hard to dep solve so that X is buildable.) The latter is stable and shouldn't change even if you decide to build different things. This should solve the next bullet point too.

That distinction makes sense. I still don't know how we should choose those two sets of targets, though.

For "solver targets", maybe we should just give the solver a list of packages to solve for and apply a preference to build all components within each of those packages. Is there any situation where we would want to give the solver a subset of a package's components as targets?

hvr commented 7 years ago

re

I'm not sure that we should make cabal find a solution for the example at the top of #3732, which only constrains base in one component. It looks the same as a package where the author meant for the constraint on base to apply to all components but wanted to avoid duplication.

IMO a better way to avoid duplication is to make it more explicit that multiple components ought to share common dependencies/constraints via the likes of #2832, and then we could unlock/enable per-component solving also in the aforementioned case when a high enough cabal-version: value is set (c.f. to how the semantics for build-depends & multiple components changed between cabal-version 1.6 and 1.8)

ezyang commented 7 years ago

I had also incorrectly assumed that every package that ignores dependencies of unbuildable components also supports per-component build.

Yeah, but I think the converse is true (per-component = correct handling of non-buildable components.)

I'm not sure I understand this part. Can we always reach all of the internal libraries if we have the main library from the installed package index? I think that is all the solver needs to do.

Not necessarily. For example, we may have registered an internal library which was only linked by the test suite, but not by the actual public library of a package. @dcoutts would probably say that this just means we shouldn't have registered it but it seems better to me to not assume any relationship without tracing the dependency relations.

For "solver targets", maybe we should just give the solver a list of packages to solve for and apply a preference to build all components within each of those packages. Is there any situation where we would want to give the solver a subset of a package's components as targets?

Yeah I'm not sure what the default should be. What we have right now is best effort which seems to work decently well, so maybe every component would have a designation like, "never", "best-effort", "always", with everything defaulting to best-effort.

grayjay commented 6 years ago

After #4884 and #5304, the solver stores the two components that are involved in each dependency. The dependent component is a D.Solver.Types.ComponentDeps.Component, and the depended upon component is a library or executable. The two PRs also implement the part of "Component dependency validation steps" from https://github.com/haskell/cabal/issues/4087#issuecomment-287665861 that checks for the existence of components, though the solver doesn't track whether components are enabled yet.

phadej commented 5 years ago

Lack of this feature required to split of test-suites and benchmarks from containers. And probably will need to do the same exercise for time. Also I cannot use QuickCheck in the tests of splitmix anymore, as QuickCheck depends on splitmix. This list grows as v2-build gets adapted.

IMO, not having this in 3.0 makes v2-build unconvinient for packages deep down in the dependency tree.

Is there anything one can do to make this happen sooner, than later.

Ericson2314 commented 5 years ago

I would be very happy to teach Cabal cross immediately if only this were done, too.

grayjay commented 5 years ago

I'm less familiar with the change that would be required to allow cycles between packages, but I think it could be implemented separately from the other parts of this feature, like the changes to flag variables. Unless I'm missing something, it would mainly require changes to cycle detection and the conversion from the solver's dependency graph to the one used outside of the solver:

I'm not sure how much code outside of the solver will also need to be changed to maintain the dependencies between components and ensure that components are built in the correct order. I'm also not sure if there will be other issues with having cycles between packages in the solver, such as infinite loops.

I don't think I'll have time to work on this feature in the near future, though I could try to answer questions or review code if anyone else wants to implement it.

Ericson2314 commented 5 years ago

@grayjay

Should we only relax cycle detection for tests and benchmarks for now?

I think the idea is that the cycles really and actually go away once we track on the level of components. No special-cases are needed.

I'm not sure what @phadej was thinking but I believe we need private dependencies for changes containers to not cause a rebuilding of the testing libraries using containers. Solving that without private dependencies would, however, be a special case for benchmarks and test suites.

grayjay commented 5 years ago

Should we only relax cycle detection for tests and benchmarks for now?

I think the idea is that the cycles really and actually go away once we track on the level of components. No special-cases are needed.

I meant that I was wondering whether there is a use case for other types of cycles now and whether we should disallow them in order to simplify install plans. We can always remove the restriction later, but it would be very hard to move in the opposite direction.

I'm not sure what @phadej was thinking but I believe we need private dependencies for changes containers to not cause a rebuilding of the testing libraries using containers. Solving that without private dependencies would, however, be a special case for benchmarks and test suites.

I'm not sure I understand. I think that private dependencies is a separate feature that doesn't require per-component dependency solving. The approach taken in #3422 is like a special case of private dependencies. I think that per-component solving would only help solve the test suite cycle issue if we used it to change the solver's cycle detection to use components instead of packages. It would cause rebuilds of the test framework, though.

Ericson2314 commented 5 years ago

I meant that I was wondering whether there is a use case for other types of cycles now and whether we should disallow them in order to simplify install plans. We can always remove the restriction later, but it would be very hard to move in the opposite direction.

I agree let's not overcommit to allowing things we will want to band later. I'm not fan of "Postel's law"'s over-application with these matters :).

The approach taken in #3422 is like a special case of private dependencies.

OK glad we agree on that.

I'm not sure I understand. I think that private dependencies is a separate feature that doesn't require per-component dependency solving.

Right if #3422 is a special case of cycles and private dependencies, I rather grow it into "private dependencies in general" rather than "cycles in general".

I think that per-component solving would only help solve the test suite cycle issue if we used it to change the solver's cycle detection to use components instead of packages. It would cause rebuilds of the test framework, though.

Totally agreed. @phadej's changes to containers (make a package under a different name in other cabal project) both avoid the cycles and avoid the extra test rebuilds. I wanted to point out that per-component therefore is not enough on its own to obviate the renamed package---need the private dependencies part to avoid the rebuilds too or there's a usability regression.

isovector commented 5 years ago

I just ran into this while trying to make polysemy depend on polysemy-plugin, but polysemy-plugin:test depend on polysemy. It's a frustrating limitation, and one that totally breaks the abstraction that these things form a graph. How much work is left to be done here? Could you use an engineering hand?

grayjay commented 5 years ago

@isovector None of the changes to cycle detection have been implemented yet, so we could definitely use a hand! I gave a summary of what I think needs to be done to only update the cycle detection in https://github.com/haskell/cabal/issues/4087#issuecomment-497244606. I'm not sure if it would require more changes outside of the solver, though.