RenateEilers commented 8 months ago

PR #12 introduced withSetup variants to run commands on monadic state machines, so that environments could be isolated between different parallel executions of a state machine.

sm :: m (StateMachine model cmd m resp)
sm = do
  env <- ...
  pure $ StateMachine {
    ...
    semantics = semantics env
    ...
}

I'd like to test properties concerning the state of the environment at the end of every run. To do so, I propose to add a field finalPostcondition :: Property (alternative naming suggestions are welcome) to StateMachine. We can then return this property at the end of executing commands, e.g. the return type of executeNParallelCommands would go from m (History cmd resp, model Concrete, Reason) to m (History cmd resp, model Concrete, Reason, Property). Then it could be propagated via runNParallelCommandsWithSetup and relevant variants to prettyNParallelCommands where a sensible message could be printed upon failure.

Feedback would be highly appreciated!

(Thanks @jasagredo for the idea)

jasagredo commented 8 months ago

Indeed, there is no way to test the final state of the system after the tests are run. One could find a way to do so in sequential properties by the following pseudocode:

prop_sequential = do
  env <- initEnv
  let sm = ... env
  prettyCommands =<< runCommands sm
  checkProperty env

For example one could include a tracer in the environment and check it after the commands are run. However, as made clear by #12 the only way to provide a clean isolated environment for each parallel test case is by initializing the state machine monadically using the WithSetup variants. However this precisely means that such environment cannot escape the scope of the state machine.

For the sake of uniformity, this could be added to the state machine as follows:

sm :: m (StateMachine model cmd m resp)
sm = do
  env <- initEnv
  pure $ StateMachine {
    ...
    semantics = semantics env
      , finalProperty = checkEnv env
    ...
}

Then as @RenateEilers says, it should check the property after the execution of all commands, and give the result to prettyParallelCommands which would then show whether either an interleaving was not explainable or/and the final property holds.

This ticket is again solving a problem specific to parallel properties where the solution could be defined for all StateMachines just for uniformity among parallel and sequential properties.

Any first impressions about this @stevana ? Should we go forward and make a PR?

stevana commented 8 months ago

Can you give a full example? (You'll need to write this for the testsuite eventually anyway).

I don't like how we keep extending the StateMachine record to handle various corner cases, this seems to suggest that the API we got isn't flexible enough. Can you think of ways, perhaps inspired by the APIs of other state machine based testing libraries, which simplify the API (rather than complicate it) while solving your problem?

jasagredo commented 8 months ago

EDIT: after writing this I realized it still might be too verbose. If it is unclear please request further clarification.

A full example would be as follows (taken from here). In that code the relevant chunks is:

  test cmds' = do
    ...
    (tracer, getTrace) <- recordingTracerIORef
    nodeDBs            <- emptyNodeDBs
    ...
    (hist, model, res, trace) <- bracket
        (open args >>= newTVarIO)
        -- Note: we might be closing a different ChainDB than the one we
        -- opened, as we can reopen it the ChainDB, swapping the ChainDB in
        -- the TVar.
        (\varDB -> readTVarIO varDB >>= close)

        $ \varDB -> do
          let env = ChainDBEnv
                { varDB
                , registry = iteratorRegistry
                , varCurSlot
                , varNextId
                , varVolatileDbFs = nodeDBsVol nodeDBs
                , args
                }
              sm' = sm env (genBlk chunkInfo) testCfg testInitExtLedger maxClockSkew
          (hist, model, res) <- QSM.runCommands' (pure sm') cmds'
          trace <- getTrace
          return (hist, model, res, trace)
    ...
    let
          modelChain = Model.currentChain $ dbModel model
          prop =
            counterexample ("Model chain: " <> condense modelChain)      $
            counterexample ("TraceEvents: " <> unlines (map show trace)) $
            tabulate "Chain length" [show (Chain.length modelChain)]     $
            tabulate "TraceEvents" (map traceEventName trace)            $
            res === Ok .&&.
            prop_trace testCfg (dbModel model) trace .&&.
            counterexample "ImmutableDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsImm fses) === 0) .&&.
            counterexample "VolatileDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsVol fses) === 0) .&&.
            counterexample "LedgerDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsLgr fses) === 0) .&&.
            counterexample "There were registered clean-up actions"
                           (remainingCleanups === 0)
      return (hist, prop)

You can see there that we create a tracer from which we extract the final traces and assert stuff on what was recorded in that trace. Also for the different databases. This is currently undoable for parallel testing. The environment will have to be set up inside the state machine creation and this test should use the WithSetup variants, to ensure that each parallel execution has a fresh environment as shown in #12.

We would like to be able to express the following:

sm env genBlock cfg initLedger maxClockSkew = do
  (tracer, getTrace) <- recordingTracerIORef
  nodeDBs            <- emptyNodeDBs
  pure $ StateMachine
    { initModel     = ...
    , transition    = transition
    , precondition  = precondition
    , postcondition = postcondition
    , generator     = ...
    , shrinker      = shrinker
    , semantics     = semantics env
    , mock          = mock
    , invariant     = ...
    , cleanup       = noCleanup
    , finalProperty = \model -> do
        let modelChain = Model.currentChain $ dbModel model
        trace <- getTraces
        fses <- atomically $ traverse readTVar nodeDBs
        pure $ 
            counterexample ("Model chain: " <> condense modelChain)      $
            counterexample ("TraceEvents: " <> unlines (map show trace)) $
            tabulate "Chain length" [show (Chain.length modelChain)]     $
            tabulate "TraceEvents" (map traceEventName trace)            $
            res === Ok .&&.
            prop_trace testCfg (dbModel model) trace .&&.
            counterexample "ImmutableDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsImm fses) === 0) .&&.
            counterexample "VolatileDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsVol fses) === 0) .&&.
            counterexample "LedgerDB is leaking file handles"
                           (Mock.numOpenHandles (nodeDBsLgr fses) === 0) .&&.
            counterexample "There were registered clean-up actions"
                           (remainingCleanups === 0)
    }

Otherwise this test would be useless in a parallel testing scenario.

This is the best example we have so far. It is simply undoable because the environment lives only inside the state machine. A different approach would be to provide an env -> Property function to the prettyParallelCommands and then return the environment when creating the state machine, which is (I think) more convoluted than what we are proposing.

stevana commented 8 months ago

EDIT: after writing this I realized it still might be too verbose.

Yes, please make the example standalone (not dependent on anything else). What you posted doesn't make sense to include in the testsuite.

jasagredo commented 8 months ago

Ok, will try to elaborate a self contained example in which this problem can be seen 👌 (perhaps not today, will try to do it soon)

jasagredo commented 8 months ago

https://gist.github.com/jasagredo/f0f3fea4b14811fb0df7fdf8d8dac91c

This modified version of the Echo test shows the problem. If I add a tracer and I want to check for example that the number of traces emitted is equal to the number of times I issued a In command, I cannot do that in the parallel case, only on the sequential case.

❯ cabal run ./Echo.hs
Tests
  Sequential: OK
    +++ OK, passed 100 tests.
  Parallel:   FAIL
    *** Failed! Falsified (after 2 tests and 2 shrinks):
    ParallelCommands
      { prefix =
          Commands
            { unCommands =
                [ Command
                    (In
                       "K|\ETX\DC2=\163759p\128577\&7<\39542-I\SO\1083040\US\146733\&82 \992441C^5\\\133004\1043070;t\DLE\166773\RS\1011177\v@!{{hr\ETX\1083918\173352\1091277\SOH4\168379\&6q:>abm\35413w=\f?\10828\ACKT&\DC3\DC2Fs\156907(+\1029686QC^\EMuO\190205")
                    InAck
                    []
                ]
            }
      , suffixes = []
      }
    More traces than in commands: 10 vs 1
    Use --quickcheck-replay=900590 to reproduce.
    Use -p '/Tests.Parallel/' to rerun this test only.
  NParallel:  FAIL
    *** Failed! Falsified (after 3 tests and 2 shrinks):
    ParallelCommands
      { prefix =
          Commands
            { unCommands =
                [ Command (In "\b\1038060VLDK\1109244on?") InAck [] ]
            }
      , suffixes = []
      }
    More traces than in commands: 10 vs 1
    Use --quickcheck-replay=311423 to reproduce.
    Use -p '/NParallel/' to rerun this test only.

2 out of 3 tests failed (0.01s)

See the error message, 10 traces vs 1 In command, because of the 10 repetitions in the parallel case.

I hope this one is more clear, if not don't hesitate on asking for further clarification.

jasagredo commented 8 months ago

We have been discussing alternative ways on doing this and why it seems they wouldn't work.

Special terminal command

A possibility is to have a last special command and provide this information we want to check in the response for the command. This poses two problems:

There is no way to say that said command MUST be generated at the end of any sequence of commands.
Such command should be terminal, and as such won't be runnable in parallel. Which in turn means that no parallel group could have this command, and as the execution of parallel state machines consists of a sequential prefix plus one or more parallel groups, none of those could have this command. Therefore it could only be checked in the sequential prefix, which defeats the whole purpose of parallel checking.

Generalizing this check to all commands

It could be doable to make all responses emit this information, such that we check it in every postcondition. This has two problems again:

Perhaps the property does not make sense to be checked at every post condition. In the Echo example it makes sense that we check every time that the number of In commands is the same as the currently recorded counter, but in the example in our codebase we want the information of the environment just for output on the counterexample.
Sometimes the information we want to provide to a property is just observable for making the counterexample easier to understand, as in the case of just showing which traces have happened during the execution. If we were to compare these observable values with the ones produced in the model, unless they are all atomic there will not be a way to produce "interleaved" traces in the model, as it is advanced via a linearization.

In general I think this option would be doable sometimes, depending only in the nature of what one wants to check/observe.

The two options above vs having a separate field

By having a separate concrete field that exposes a final property, we can know that this will definitely be called at the end of the execution and only once, which decreases the amount of useless command sequences generated and reduces the very fine tailoring one would have to do for threading these intermediate "interleaved" information in the responses.

References (Concrete and Symbolic)

We discussed the possibility of using references to transfer this information from command to command while advancing the model, but we are not that used to working with References and perhaps this is not even the intended way of using them.

Hedgehog comparison

Disclaimer: I have only slightly adapted existing Hedgehog tests, I am not a Hedgehog expert.

The only other library that seems to provide parallel state machine tests is Hedgehog. https://hackage.haskell.org/package/hedgehog-1.4/docs/Hedgehog.html#g:5. There is also quickcheck-dynamic but it doesn't have support for parallel state machines.

Hedgehog has some different features than QSM, in particular I can see:

There is only one parallel group after the sequential prefix
generation of parallel commands checks that a generated sequence is runnable in its own, not that any possible interleaving still keeps all preconditions (parallelSafe in QSM).
There are no multiple invocations for RTS randomization, which do exist in QSM and is in fact the reason why we need all this.

So I'm not sure we can take many inspiration from Hedgehog.

Proposal

The proposal is to have a field finalProperty :: PropertyM m () in the StateMachine such that we can call it after the execution of the state machine, in each repetition of the parallel sequence of commands.

stevana commented 8 months ago

Special terminal command

I haven't had time to look into this in detail yet, but why doesn't the following work: forAllCommands ... $ \cmds -> let cmds' = cmds ++ [Terminal]?

(The same can be done for parallel commands: forAllParallelCommands ... $ \cmds -> let cmds' = cmds { suffixes = suffixes cmds ++ [Pair [] [Terminal]] }.)

jasagredo commented 8 months ago

If we introduce the command that way it ought to work. The Command datatype would then have a constructor that must not be generated and that pollutes a bunch of instances (like shrinking and so) with useless cases, but yeah... I guess that would be doable?

stevana commented 7 months ago

Fixed in #43.

stevana / quickcheck-state-machine

Add option to test properties on isolated environment #40

Special terminal command

Generalizing this check to all commands

The two options above vs having a separate field

References (Concrete and Symbolic)

Hedgehog comparison

Proposal