haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 697 forks source link

`cabal test` doesn't always solve for tests (unless `--enable-tests` or `tests: True` is given) #7883

Open Mikolaj opened 2 years ago

Mikolaj commented 2 years ago

@ulysses4ever's exec summary:


Describe the bug

I'm dog-fooding current dev cabal from master branch.

Solver is not cooperating. Cabal says "solver picked a plan that does not include the test suites" despite there being such a plan, which can be seen when forcing with 'tests: True'.

To Reproduce

Clone

https://github.com/Mikolaj/mostly-harmless/commit/3feb586519b6fbe87aa1867700206b1af0bd600f

Run the following (it seems, a fresh package store is not required after all)

cabal test -w ghc-9.2.1

It shows

Resolving dependencies...
Error: cabal: Cannot test the package mostly-harmless-0.1.0.0 because none of
the components are available to build: the test suite 'mostly-harmless-test'
is not available because the solver picked a plan that does not include the
test suites, perhaps because no such plan exists. To see the error message
explaining the problems with such plans, force the solver to include the test
suites for all packages, by adding the line 'tests: True' to the
'cabal.project.local' file.

but cabal build -w ghc-9.2.1 --enable-tests (as well as cabal test -w ghc-9.2.1 with 'tests: True' in cabal.project.local) goes through fine and tests run. However, after removing cabal.project.local, cabal test -w ghc-9.2.1 fails again the same.

The same attempts with ghc-8.10.7 work fine.

Expected behavior

The solver should pick the right plan without forcing. Forcing would ideally be used only to show the error that prevents the solver from finding a plan. That's the assumption under which ~@gelisam and~ me operated in #7834 and related tickets. Now it's all put into question.

gelisam commented 2 years ago

It's not the assumption I was using :) Remember this comment, in which I said that I think the solver is allowed to decide to pick a plan which does not include tests even if one exists. For example, if suppose the solver is both allowed to decide whether to enable tests and whether to enable the benchmarks. The solver might try four combinations:

  1. enable tests, enable benchmarks
  2. disable tests, enable benchmarks
  3. enable tests, disable benchmarks
  4. disable tests, disable benchmarks

If the solver tries them in that order and picks the first one which succeeds, you'll observe the current bug if 1 fails and 2 and 3 succeed, because 2 is picked even though 3 succeeds. And we can't fix this by checking 3 before 2, because then we'd just have the same bug but for benchmarks!

The reality is much worse, because there are many more dimensions of choice, such as the version of every recursive dependency. So many that it's not feasible to check every allowed plan in which the tests are enabled. Instead, a heuristic must be used, to examine more promising branches of the possibility tree before others. And if that heuristic makes it look like disabling the tests is a promising avenue, the solver can easily find a plan in the disabled-tests branch without having completely explored the enabled-tests branch, and thus choose to disable the tests even if a successful plan exists somewhere in that unexplored enable-tests branch.

Mikolaj commented 2 years ago

@gelisam: you are right and I even responded to that comment, but then promptly repressed that, because it's too painful to believe. In fact, the cabal's blurb in my bug's description is correct and takes into account the possibility of solver not cooperating and that's exactly the blurb you changed in #7834. So, #7834 and related are not threatened.

So the problem in my case is that I have 1 library, 1 exe, 1 testsuite and the solver is not certain that the exe and the test are not excluding each other? And so it picks a plan where exe can be built and ignores the testsuite?

Could we tweak cabal to prioritize tests (respectively, benchmarks) over exes and libraries and everything, when cabal test is invoked (respectively, cabal benchmark)?

@grayjay: do you think it's easy to ask the solver to do that and cabal just fails to ask, or is solver not able to prioritize things, apart of the brute force mode where it tries exhaustively and fails, if it can't build something (e.g., --enable-tests)?

gelisam commented 2 years ago

Could we tweak cabal to prioritize tests

There is a very clear description of the algorithm in this Well-Typed post on Qualified Goals in the Cabal Solver. There is a very clear phase where such preferences are applied: the "Heuristics and Preferences " section in which the tree branches are reordered. I think it would be instructive to create a variant of your repro case in which the search tree is small enough that we can make diagrams like the ones in the post, so we can see at which node the choice of enabling tests or not is made, and discuss at which node we would like to make that choice instead.

grayjay commented 2 years ago

I think this issue is the same as #5079. The problem is that, by default, cabal runs the solver with only a preference to enable tests and benchmarks. The preference is only applied locally by sorting nodes in the search tree, which means that the solver may disable tests in order to satisfy a different preference, such as the preference to avoid deprecated versions. We could apply preferences globally by implementing a feature like #2860 and give the testing preference higher priority, but it would only reduce the frequency of the current issue, not prevent it. There would still be cases where the solver silently disables tests without --enable-tests, such as the conflict between tests and benchmarks in https://github.com/haskell/cabal/issues/7883#issuecomment-1002072947 or a conflict between two test suites.

Could we tweak cabal to prioritize tests (respectively, benchmarks) over exes and libraries and everything, when cabal test is invoked (respectively, cabal benchmark)?

I think we should continue using the same default solver options for both "build" and "test" in order to avoid unstable install plans. I think that it would be very confusing for users to get completely different versions of dependencies depending on the cabal command that was run.

Mikolaj commented 2 years ago

Doh, thank you very much for the explanations. I apologise, I couldn't get around to generating that small test case and tried to do that before responding. But I feel this is an important usability point, so I don't want to just close as a duplicate.

I think we should continue using the same default solver options for both "build" and "test" in order to avoid unstable install plans. I think that it would be very confusing for users to get completely different versions of dependencies depending on the cabal command that was run.

@grayjay: do you mean that if "test" can't be solved using a superset of dependencies generated for "build", it always fails (without --enable-tests)? That guarantee would indeed be worth something, but only if the user can depend on it. Or am I over-interpreting?

gelisam commented 2 years ago

do you mean that if "test" can't be solved using a superset of dependencies generated for "build"

The question implies that cabal build runs the solver to pick versions of the dependencies of lib and exe targets, then builds them, while cabal test runs the solver to pick versions of the test targets, then builds and runs them. This is not how those commands work. If they did (I think they once did?), then running cabal build followed by cabal test would likely pick different plans (what@grayjay refers to as an "unstable install plan") and thus end up building dependencies twice. Or if like you suggest, subsequent invocations of the solver only pick plans which are compatible with previous invocations, then which plan you get would depend on whether you start by running cabal build or cabal test, which is a project-specific version of the so-called "cabal hell" situation we had before the v2 commands.

Since both of those behaviors are undesirable, what is actually happening is that all the commands (except cabal-install) use the same plan. So it doesn't matter if you call cabal build or cabal test first, the solver will always look for a plan which involves all the libraries, tests, exes, and benchmarks.

I was careful to say "involves", not "includes", because a plan does not just pick version numbers for the dependencies, it also makes decisions about flags, whether to enable tests, etc. That's how we nevertheless manage to get plans which don't include the tests.

Mikolaj commented 2 years ago

OK, you convinced me we are doing the right thing. I take back my proposal to try harder to include tests when the solver notices it's run from cabal test. That's a bad idea.

I guess, that leaves me with a request to try harder (ideally guarantee) that the plan we find is a local maximum in terms of how many components are included. E.g., in my (non-minimized) real life case above, it was possible to add a testsuite (without resorting to changing flags or picking deprecated deps), but the chosen plan included none. Local maximum in the sense, that if it's possible to add one more component (without removing any that are already included) by changing only dependencies (and not to deprecated versions) then it's added. Perhaps what that entails is that the preference to include even a single new component should be stronger than to keep optimal dependencies (newest possible, or already found), even if that means downgrading absolutely all dependencies to stone age versions (barring deprecated versions).

Perhaps that's impossible, though. I'm reading the blog post and it underlines that decisions are local, while I require global guarantees, which may imply an exhaustive search/backtracking. I think the solver never backtracks over preferences, e.g., a preference to not exclude a component? It only backtracks over constraint-like things, e.g., the tests: True hard requirements.

Or how bad would it be to treat component inclusion as hard constraints? (The difference vs tests: True would be that the latter causes tests to be actually built.) Or perhaps we can let the user decide, via tests: Plan, that the tests have to be included in the plan, at the cost of as many backtracks, as needed? @gelisam: did our brainstormimg about a new value for the tests: setting touch this (it's been a long time)? That would an acceptable setting for a user that runs cabal build 1000 times a day and cabal test 10 times a day. Without it, a user for whom the default plan excludes tests, is forced to either switch plans and rebuild every time the user runs tests or force tests: True and slow down every error-checking build. In large projects, that's a significant cost.

Please tell me how I'm mixing up things again. ;)

gelisam commented 2 years ago

@gelisam: did our brainstormimg about a new value for the tests: setting touch this

The new value does not change the current behavior in any way, it just better documents the fact that they are three options rather than just True and False.

gelisam commented 2 years ago

Or how bad would it be to treat component inclusion as hard constraints?

If we limit our attention to plans which include all the components, then Haskellers will be happier because they are the ones who tend to be surprised by the current behaviour, but non-Haskellers, who need the most help, will be greated with an inscrutable "cannot find build plan" error more often, because the solver will not be allowed to work harder by looking for a build plan which can build the executable they care about but not the tests they don't know how to run anyway.

gelisam commented 2 years ago

Instead of adding even more values to tests, how about splitting the field into several smaller ones? One field to determine if the tests should be included in the plan, and one to specify which components are included in the default all:all target. Speaking of which, you can run cabal build all:libraries if you don't want to build the tests.

Mikolaj commented 2 years ago

If we limit our attention to plans which include all the components, then Haskellers will be happier because they are the ones who tend to be surprised by the current behaviour, but non-Haskellers, who need the most help, will be greated with an inscrutable "cannot find build plan" error more often, because the solver will not be allowed to work harder by looking for a build plan which can build the executable they care about but not the tests they don't know how to run anyway.

But aren't newcomers mostly running cabal install package-name or sometimes cabal install .? Do I remember right cabal install doesn't try to find a build plan with tests and benchmarks?

Instead of adding even more values to tests, how about splitting the field into several smaller ones? One field to determine if the tests should be included in the plan, and one to specify which components are included in the default all:all target.

Agreed.

Speaking of which, you can run cabal build all:libraries if you don't want to build the tests.

Good to know, thank you. But in a week all I remember will be 'cabal build' again. ;) OTOH, if I stick something that corresponds to that in my cabal.project.local, it will still work and for new projects I will remember to look it up in my old projects (I always keep a lot of cabal.project.local variants, with different suffixes). So IMHO it's worth it to try to match commandline and config in this case, even partially.

gelisam commented 2 years ago

Oh right, good point about cabal install using a different plan! In that case, I now agree that tests: True should be the default, and perhaps even the only option.

grayjay commented 2 years ago

Instead of adding even more values to tests, how about splitting the field into several smaller ones? One field to determine if the tests should be included in the plan, and one to specify which components are included in the default all:all target.

I like this idea. If I understand correctly, the first field is exactly the same as the existing boolean tests field, and the second field only chooses what is built, so it doesn't need to be an input to the solver.

Oh right, good point about cabal install using a different plan! In that case, I now agree that tests: True should be the default, and perhaps even the only option.

I also like this idea. In my opinion, we should remove the third value, where tests are only enabled with a preference. I think that the testing preference adds complexity without much benefit, because it doesn't make any useful guarantees (and can't make useful guarantees without behaving like a constraint). Setting the default to true will give cabal test the expected behavior. We can also make cabal build give an error that suggests --disable-tests when dependency solving fails. I think that this all applies to enabling benchmarks, too.

gbaz commented 2 years ago

It sounds like we're iterating to the idea that the default is True and there's no third choice, for tests and benchmarks both. And then we need to add a warning furthermore to suggest disabling the flag if no plan can be found, and finally need to have the --enable-tests and --enable-benchmarks flag not force building tests or benchmarks, but only to include them in the plan, and have them only built when explicitly requested as components or when running cabal test or cabal benchmark.

Mikolaj commented 2 years ago

@gbaz; I'm happy we all agree about the outline of the solution, even though it's not backward compatible, because we remove the current default (the non-binding promise to try and include stuff in the plan), so it can't be the defaults any more.

Regarding the details, @gelisam and me thought about an extra field in the project file that would mean "build it" so that the tests: True field has the same meaning as the new meaning (only plan) of --enable-tests and tests: False as --disable-tests, while the new field would force building the component (I fully expect I'm misrepresenting @gelisam's position again :). Is that a good idea? In particular, is making tests: Tests and --enable-tests the same a good idea and, independently, do we need another field in the project file for the building bit, e.g., to let users migrate easily from tests: True to the new setup?

gelisam commented 2 years ago

is making tests: Tests and --enable-tests the same a good idea

Aren't they already the same?

I don't have an opinion as to whether it is better to add a flag changing the meaning of all:all or to ask users to type cabal build all:all all:tests.

Mikolaj commented 2 years ago

Aren't they already the same?

They are, but @gbaz proposes to make --enable-tests equivalent to the new default, so its semantics would change. Or should we keep --enable-tests and --disable-tests unchanged, that is, forcing both the planning and the building? Keeping things as they were is good, but do we end up with less power?

Let me think. Can we express

Seems to check out. The only thing missing is overriding "tests: True" ("plan, build") on command line to "plan, don't build". Is that important?

gelisam commented 2 years ago

I like the simplicity of having --enable-tests/--disable-tests mean the same thing as tests: True/tests:False. I support @gbaz 's summary of what now needs to be implemented, and since I am now quite familiar with the tests and --enable-tests code, I volunteer to implement it (eventually).

With that approach, we can still express all the combinations:

The only thing missing is overriding "tests: True" ("plan, build")

Wait, what? I thought we all agreed that tests: True should now mean "plan, don't build", not "plan, build". Could you restate what the various combinations of tests: True/tests: False, --enable-tests/--disable-tests, and cabal build/cabal test do in your version of the proposal in which --enable-tests/--disable-tests keep their current semantics but tests: True/tests: False don't?

gelisam commented 2 years ago

I realize that I'm asking you for quite a number of combinations, so I should be willing to do that amount of work myself. Here is the behaviour I expect for the proposal I support:

All of the following combinations have the effect "plan libs, exes, tests, and benches; build libs and exes":

All of the following combinations have the effect "plan libs, exes, tests, and benches; build and run tests":

All of the following combinations have the effect "plan libs, exes, and benches; build libs and exes":

All of the following combinations have the effect of failing with an error message explaining that tests are disabled:

Mikolaj commented 2 years ago

I thought we all agreed that tests: True should now mean "plan, don't build", not "plan, build".

I don't oppose that, but in that case the user can't request via a project file to have tests planned, built (type-checked) and not run via any subsequent cabal build invocations. And that's useful and that was expressible previously, so there is no migration path if the user had "tests: True" to that effect in a project file (in CI, say). Is my motivation clear?

gelisam commented 2 years ago

Is my motivation clear?

Yes, thanks for explaining. Given that motivation, I think it's important to list the current behaviour under all the combinations, to make sure all those behaviours can easily be obtained under the new proposal.

All of the following combinations have the effect "plan libs and exes, and try to include tests if possible; build libs and exes":

All of the following combinations have the effect "plan libs and exes; build libs and exes":

All of the following combinations have the effect "plan libs, exes, and tests; build libs, exes, and tests":

All of the following combinations have the effect "plan libs and exes, and try to include tests if possible; build and run tests, failing if the tests were not included in the plan":

All of the following combinations have the effect "plan libs, exes, and tests; build and run tests":

All of the following combinations have the effect "plan libs and exes; fail because tests are disabled":

gelisam commented 2 years ago

Here's the subset of those behaviours we can accomplish with the proposal I support. All the commands assume that tests is either unspecified or set to its new default, True.

  1. ~"plan libs and exes, and try to include tests if possible; build libs and exes"~: no longer supported, too confusing
  2. "plan libs and exes; build libs and exes": cabal build --disable-tests
  3. "plan libs, exes, and tests; build libs, exes, and tests": cabal build all all:tests
  4. ~"plan libs and exes, and try to include tests if possible; build and run tests, failing if the tests were not included in the plan"~: no longer supported, too confusing
  5. "plan libs, exes, and tests; build and run tests": cabal test
  6. "plan libs and exes; fail because tests are disabled": cabal test --disable-tests

I think your only complaint about the above is that cabal build all all:tests is hard to remember, and that you'd rather have a ~/.cabal/config setting you can set and forget?

Mikolaj commented 2 years ago

Actually, I didn't even remember one can set it in ~/.cabal/config, but that's a good point. I meant cabal.project and especiallycabal.project.local (that people set via cabal configure).

You are right, this is my only complaint (if you generalize to project files and ignoring benchmarks, which behave analogously).

To reword: the biggest complaint is that if somebody had tests: True in cabal.project and various reasonable commands in CI, that person can't migrate (that is, obtain the same behaviour as before) by only changing something in cabal.project. Adding arguments to (possibly many) invocations of cabal in the CI is required. Compare with the alternative, where that person doesn't need to change anything anywhere for that use case. But regardless of the alternative, the new setup has a relatively less expressive cabal.project --- you can't permanently force building tests in cabal.project any more (and so you can't force it via cabal configure either, etc.)

The minor and last complaint is that a common idiom "typecheck everything" that previously required only (tests: True and cabal build) or only (cabal build --enable-tests) now requires the scary cabal build all all:tests.

grayjay commented 2 years ago

I wanted to mention that there is one advantage to continuing to use three values for the "tests" configuration option over splitting it into two independent options (other than backwards compatibility). The three value design avoids the possibility of setting the "plan tests" option to false and the "build tests" option to true, which is a combination that doesn't make sense. In the three value design I am imagining, tests: True/--enable-tests and tests: False/--disable-tests would remain unchanged and the third (default) value would change to mean "plan tests, don't build tests". We would also name the third value as in #7829.

gelisam commented 2 years ago

Ooh, making invalid states unrepresentable, I like that!

Let me again exhaustively list the behaviours for this proposal.

All of the following combinations have the effect "plan libs and exes; build libs and exes":

All of the following combinations have the effect "plan libs, exes, and tests; build libs and exes":

All of the following combinations have the effect "plan libs, exes, and tests; build libs, exes, and tests":

All of the following combinations have the effect "plan libs, exes, and tests; build and run tests":

All of the following combinations have the effect "plan libs and exes; fail because tests are disabled":

Mikolaj commented 2 years ago

This is brilliant. Let's do it.

grayjay commented 1 year ago

See #5079 for previous discussion of this issue.