Print runnable scenario in case of test failure

n-osborne commented 6 months ago

This PR is the continuation of, and is based on, #202. Please consider only the six last commits.

This PR makes the failure message print a runnable scenario with information about returned and expected values.

It uses the same mechanism as introduced in #202 for communicating with Qcheck.fail_reportf.

For printing the commands with their sut argument in the right place, only the generated show_cmd has been modified.

Printing the user-provided value for init_sut is quite straightforward, we just have to pass the information to the runtime.

Printing the expected value is a bit more complicated. First, we look at the Gospel specifications to see if there is enough information to compute the expected returned value (some functions may be specified in a way that their contract doesn't include a computation of the returned value). If there is, we wrap this value in a STM.res type and send it to the runtime through ortac_postond along all the other information. It there is not, we send a dummy value, to inform the printer that the expected returned value is unknown. A warning will be displayed to the user in the second case, but this will not affect whether the function is tested or not.

This PR only deals with normal behaviour, meaning that the excpetional branches in ortac_postond don't send an expected returned value to the runtime for now. I believe that handling exceptional behaviour can be done in another PR (this is me trying to keep PRs short).

The ortac_runtime now handle this extra information to print a runnable scenario to the user in case of test failure.

For now, I've made the choice to print the function calls in an ignore and to print the returned value and the optional expected returned value in comments. We've discussed elsewhere about placing the function calls in an assert, but we don't always have an expected returned value

If we introduce a bug in plugins/qcheck-stm/test/array.ml, we have now:

λ dune build @launchtests
File "plugins/qcheck-stm/test/dune.inc", line 49, characters 0-75:
49 | (rule
50 |  (alias launchtests)
51 |  (action
52 |   (run %{dep:array_stm_tests.exe} -v)))
random seed: 515405686
generated error fail pass / total     time test name
[✗]    1    0    1    0 / 1000     0.0s Array STM tests (generating)

--- Failure --------------------------------------------------------------------

Test Array STM tests failed (14 shrink steps):

   set sut (-2603392982468686691) '\190'
   to_list sut

+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test Array STM tests:

Gospel specification violation in function to_list

  File "array.mli", line 36, characters 12-26:
    l = t.contents

when executing the following sequence of operations:

  let sut = make 16 'a' in
  ignore (set sut (-2603392982468686691) '\190'); (* returned Error (Invalid_argument("index ou t of bounds")) *)
  ignore (to_list sut); (* returned ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; '\190'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a']
                           expected ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a'] *)

================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)

[1]

jmid commented 6 months ago

Nice work Nicolas :smiley::tada: I particularly liked the (aligned) returned and expected results! :heart:

I suppose we could save a few characters (the shorter output the better IMO) by using

top-level let-forms

let _ = instead of ignore( ... ):

let sut = make 16 'a'
let _ = set sut (-2603392982468686691) '\190' (* returned Error (Invalid_argument("index ou t of bounds")) *)
let _ = to_list sut (* returned ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; '\190'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a']
                     expected ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a'] *)

As a runnable program, the example seems to place itself between two chairs. It is both informative and runnable, yet requires manual changes:

it will stop with an exception on line 2 before reaching the model-mismatch in line 3.
the mismatch in line 3 needs, e.g., an assert to express the expectation as runnable code, otherwise it will just succeed silently.
as a very minor nit I prefixed it with open Array to directly copy-paste it into a top-level

Short-circuit booleans means the underlying QCheck property always stops at the first model-mismatch. I think this means that generally

we can ignore all results up until the last line (annotating them as comments is however very informative!)
exception raising calls underway need some kind of try ... with or protect

only the last line needs an assert, e.g., by outputting a two-line thing:

let r = to_list sut
let _ = assert(r = ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a']

These are just my immediate thoughts, YMMV. :shrug: We can also leave further improvements for the summer... (hint hint :smiley:)

jmid commented 6 months ago

Sorry if this didn't come across clearly above: The output is already a nice improvement over the default qcheck-stm one! :+1:

n-osborne commented 6 months ago

Thank you for your comments :-)

I've modified the PR so that it includes an open of the tested module at the beginning of the ouptuted scenario.

Regarding the use of an assert for the failing function call, it is worth noting that we don't always have an expected value to show: specifications can be written in a way that we can know when something goes wrong but we can't compute the expected value.

That simply means that we can't guarantee to the user that we output a failing scenario for a failing test.

n-osborne commented 5 months ago

Green light to merge has been given in another chat.

ocaml-gospel / ortac

Print runnable scenario in case of test failure #204