Add Out_channel STM tests

ocaml-multicore / multicoretests

PBT testsuite and libraries for testing multicore OCaml

https://ocaml-multicore.github.io/multicoretests/

BSD 2-Clause "Simplified" License

37 stars 16 forks source link

Add Out_channel STM tests #431

Closed jmid closed 7 months ago

jmid commented 8 months ago

The Lin Out_channel test is a sore point of the test suite:

it required several revisions to get to the current state #387 #370 #319
it takes too long to run on macOS and has therefore been disabled #326
it doesn't always trigger on FreeBSD #401
it occasionally takes very long to run or shrink which can cause timeouts #378

Out_channels buffer internally, meaning length's result can vary. This is a bad fit for Lin's sequential consistency test, because we may end up finding counterexamples of different buffering (between a parallel and any interleaved sequential run) - or just spend a long time searching for one. As such, a model-based STM test is a better fit - and this PR offers one.

The new model-based test

tests sequential and parallel usage
runs across all CI platforms as a positive test
covers as much of the Out_channel module as the existing Lin test
runs much more predictably
enables us to test Out_channel on macOS again (without any issues)

Many things thus speak for switching to it.

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Technically, there are a few nuggets in the test code:

we model both closed and open Out_channels with suitable preconditions. As such the test can generate open-close-open-close cmd sequences.
the uncertainty of buffering affecting length's output is handled by comparing less-or-equal-to the model's length
binary-mode (only available on Windows) is modeled in the state
- when enabled on MinGW/Cygwin it means length increases by 2 for every '\n' output (since '\n' maps to "\r\n")
- position does not reflect this translation though
- the tests finally found a counterexample illustrating how buffering may interfere with set_binary_mode: it may not be the mode enabled at output-call-time that takes effect (example below). This is solved by always calling flush before set_binary_mode on MinGW/Cygwin where this matters:
```
Open_text : Ok (())
Set_binary_mode true : ()
Output_char '\n' : Ok (())
Set_binary_mode false : ()
Flush : ()
Length : 2
```

Closes #378 and #401

jmid commented 8 months ago

CI summary for 2150652:

All 6 Cygwin/MinGW trunk workflows failed with a version mismatch, since trunk is now 5.3
linux-arm64-5.2 aborted domain_spawntree with Fatal error: Failed to create domain #428

Out of 59 workflows 7 failed with 1 genuine failure and 6 ci setup issues

jmid commented 8 months ago

I've rebased the PR on main after the merge of #429

jmid commented 8 months ago

CI summary for 024dd85

4 workflows aborted domain_spawntree with Fatal error: Failed to create domain #428
- Cygwin 5.2
- Cygwin trunk
- FP trunk
- MinGW 5.2
MinGW bytecode 5.2 timed out during threadomain #203

Out of 44 workflows 5 failed with all 5 being genuine failures.

shym commented 8 months ago

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Do you think the STM tests are less likely to trigger such defects?

jmid commented 8 months ago

Thanks for the review!

Discussing this PR with you triggered me to look up previous counterexamples reported by the Lin Out_channel test. Indeed, the last merge to main a571972 triggered, e.g., the following for the Linux 5.2 debug workflow:

Messages for test Lin Out_channel test with Domain:

  Results incompatible with sequential execution

                                           |                  
                         Out_channel.output_byte t 2 : Ok (())
                                           |                  
                       .--------------------------------------.
                       |                                      |                  
         Out_channel.length t : Ok (1)         Out_channel.close_noerr t : ()

This should have a possible sequentialization (below) which we seem to miss:

Out_channel.output_byte t 2;;
Out_channel.length t;;
Out_channel.close_noerr t;;

Looking at previous counterexamples also made me realize that

some of them are triggered by using close/close_noerr in parallel. However we won't generate such inputs in this STM test version ATM unfortunately, because they don't have a well-balanced open-close serialization. I want to experiment with lifting this limitation.

other Lin Out_channel counterexamples are triggered by weird output unlinearizable behaviours arising from running other cmds on already closed channels. This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS. As such, I'm more reluctant to invest in lifting it.

                                                                                                               |                                                       
                                                                                                 Out_channel.flush t : Ok (())                                         
                                                             Out_channel.output_bytes t "7)\016)\170\245A{(2\255\000mr\185\243\206\2475-\192i\135\2216}w" : Ok (())    
                                                                                         Out_channel.set_binary_mode t false : Ok (())                                 
                                                                                                               |                                                       
                                                       .---------------------------------------------------------------------------------------------------------------.
                                                       |                                                                                                               |                                                       
                                         Out_channel.length t : Ok (27)                                                                                  Out_channel.close_noerr t : ()                                        
                                     Out_channel.is_buffered t : Ok (true)                                                                               Out_channel.close t : Ok (())                                         
                                     Out_channel.output_byte t 6 : Ok (())                                                                               Out_channel.pos t : Ok (65536)                                        
                                                                                                                                                     Out_channel.is_buffered t : Ok (true)                                     
                                                                                                                                 Out_channel.output_bytes t "K\226" : Error (Sys_error("Bad file descriptor"))                 
                                                                                                                                                     Out_channel.is_buffered t : Ok (true)                                     
                                                                                                                 Out_channel.output_substring t "\197\180\2484g\180\230\193" 4 7 : Error (Invalid_argument("output_substring"))
                                                                                                                                                         Out_channel.flush t : Ok (())

Our discussion also made me check my recollection regarding locking: https://github.com/ocaml/ocaml/blob/cedff5854ac91dccf5847b527192223ef506b1e2/runtime/io.c#L60

All operations on channels first take the channel lock.

As such, they Out_channel operations should act atomically when used in parallel and be sequential consistent.

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Do you think the STM tests are less likely to trigger such defects?

Possibly, but I'm unsure.

A single Lin input will call the Out_channel interface (up to) 50 times 1 parallel run #interleavings times.
In comparison a single STM input will call the interface (up to) 25 times * 1 parallel run.

Since #interleavings can be quite high, a corner case bug is more likely to trigger more often when it is run more. On the other hand, when considering this behaviour across a count of 1000, the negative Lin test will finish sooner (and report a "counterexample") whereas the current STM test runs to completion (1000 inputs * 25 repetitions).

jmid commented 8 months ago

In 8236876 I extend the STM tests to be able to generate close and close_noexn in state Closed too.

shym commented 8 months ago

This should have a possible sequentialization (below) which we seem to miss:

In the sequentialized version, the length will most probably be 0, as the buffer is not flushed; the output we see is due to the close_noerr being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?

This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.

To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the postcond function, we’ll never test close or close_noerr in one of the parallel branches except when the other branch is only doing open and close calls, or am I mistaken in how postcond is used in all_interleavings_ok? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

jmid commented 8 months ago

This should have a possible sequentialization (below) which we seem to miss:

In the sequentialized version, the length will most probably be 0, as the buffer is not flushed; the output we see is due to the close_noerr being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?

This is a probable explanation indeed! :+1: However it seems a bit unsatisfying that this stops testing and is reported as a counterexample of parallelism-safety.

This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.

To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the postcond function, we’ll never test close or close_noerr in one of the parallel branches except when the other branch is only doing open and close calls, or am I mistaken in how postcond is used in all_interleavings_ok? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

Point well taken - and yes, you are right :+1: Following the Erlang version described in the ICFP'09 paper, STM will generate candidate "Y"-triples and only keep those which satisfy all preconditions in all interleavings (which is what all_interleavings_ok should express). This indeed leaves out two parallel close* cmds.

I've therefore taken a pass over the STM test and extended the cmds so that all of them can also occur in state Closed. The only generator and precond limitation now is that Open_text is not allowed in state Open (this was also a limitation of the Lin test).

This revealed a divergence from the spec as I shared offline:

val close : t -> unit
(** Close the given channel, flushing all buffered write operations.  Output
    functions raise a [Sys_error] exception when they are applied to a closed
    output channel, except {!close} and {!flush}, which do nothing when applied
    to an already closed channel.  Note that {!close} may raise [Sys_error] if
    the operating system signals an error when flushing or closing. *)

Yet output_string, output_bytes, output, and output_substring of length 0 on a Closed channel does not fail, e.g.:

--- Failure --------------------------------------------------------------------

Test STM Out_channel test sequential failed (22 shrink steps):

   Close
   Output_string ""

+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Out_channel test sequential:

  Results incompatible with model

   Close : Ok (())
   Output_string "" : Ok (())

For now, I've adjust postcond to accept these cornercases. We should consider submitting a PR to adjust the specification though...

jmid commented 8 months ago

This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

We can now experimentally confirm that the latest changes can indeed find issues like that. All 5.2 and 5.3/trunk workflows show a regression on outputting on a closed Out_channel, where only the first cmd now triggers an exception: :open_mouth:

+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Out_channel test sequential:

  Results incompatible with model

   Close_noerr : Ok (())
   Output_string "}\169)gN\255\017" : Error (Sys_error("Bad file descriptor"))
   Output_byte 48 : Ok (())

jmid commented 8 months ago

Some overdue CI summaries...

For 9c63209:

Cygwin trunk aborted during domain_spawntree #428
MinGW bytecode 5.1 crashed during threadomain #203
MinGW bytecode 5.2 aborted during domain_spawntree #428
MinGW trunk aborted during domain_spawntree #428
macOS trunk failed to build dune #433
Cygwin 5.2 was cancelled after 138m during Dynlink #307

Out of 44 workflows 5 failed and 1 was cancelled, all 6 due to genuine issues

For 8236876:

Cygwin 5.2 aborted during domain_spawntree #428
Cygwin trunk aborted during domain_spawntree #428
MinGW 5.2 aborted during domain_spawntree #428
MinGW bytecode trunk aborted during domain_spawntree #428
macOS trunk failed to build dune #433

Out of 44 workflows 5 failed with all 5 being genuine issues

For 0c8fdcb:

MinGW bytecode 5.1 timed out in threadomain #203
4 workflows aborted during domain_spawntree #428
- Cygwin 5.2
- Cygwin trunk
- MinGW 5.2
- macOS 5.2
macOS trunk failed to install dune #433
MinGW 5.1 failed the new STM Out_channel test with flush raising Sys_error("Bad file descriptor") when used in parallel
all 21 5.2 and trunk/5.3 workflows failed the new STM Out_channel due to the exception regression #432
- 32bit 5.2
- 32bit trunk
- Bytecode 5.2
- Bytecode trunk
- Cygwin 5.2
- Cygwin trunk
- FP 5.2
- FP trunk
- Linux 5.2
- Linux 5.2 debug
- Linux trunk
- Linux trunk debug
- MinGW 5.2
- MinGW bytecode 5.2
- MinGW bytecode trunk
- MinGW trunk
- macOS 5.2
- linux-arm64-5.2
- linux-s390x-5.2
- linux-ppc64le-5.2
- macos-arm64-5.2

Out of 44 workflows 24 failed with a total of 28 failures (4 workflows failed two tests) with all of them being genuine

jmid commented 8 months ago

The PR's latest test revision is so effective at triggering the #432 regression that we should consider temporarily relaxing the test to accept the current behaviour, to avoid having to check ~20 failing workflows on each CI run... :grimacing:

jmid commented 7 months ago

I rebased on main to avoid more false alarms and added 86f2ab2 to temporarily disable the #432 regression reports.

jmid commented 7 months ago

CI summary:

Cygwin 5.2 timed out during the Lin Dynlink test #307
macOS trunk failed to build dune #433

Out of 44 workflows 2 failed, with both being genuine issues

jmid commented 7 months ago

CI summary for merge to main:

macOS trunk failed to build dune #433

Out of 45 workflows 1 failed with a genuine issue