Closed jmid closed 7 months ago
CI summary for 2150652:
trunk
workflows failed with a version mismatch, since trunk
is now 5.3
domain_spawntree
with Fatal error: Failed to create domain
#428Out of 59 workflows 7 failed with 1 genuine failure and 6 ci setup issues
I've rebased the PR on main
after the merge of #429
CI summary for 024dd85
domain_spawntree
with Fatal error: Failed to create domain
#428
threadomain
#203Out of 44 workflows 5 failed with all 5 being genuine failures.
On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412
Do you think the STM tests are less likely to trigger such defects?
Thanks for the review!
Discussing this PR with you triggered me to look up previous counterexamples reported by the Lin Out_channel
test.
Indeed, the last merge to main
a571972 triggered, e.g., the following for the Linux 5.2 debug workflow:
Messages for test Lin Out_channel test with Domain:
Results incompatible with sequential execution
|
Out_channel.output_byte t 2 : Ok (())
|
.--------------------------------------.
| |
Out_channel.length t : Ok (1) Out_channel.close_noerr t : ()
This should have a possible sequentialization (below) which we seem to miss:
Out_channel.output_byte t 2;;
Out_channel.length t;;
Out_channel.close_noerr t;;
Looking at previous counterexamples also made me realize that
close
/close_noerr
in parallel. However we won't generate such inputs in this STM
test version ATM unfortunately, because they don't have a well-balanced open
-close
serialization. I want to experiment with lifting this limitation.Lin Out_channel
counterexamples are triggered by weird output unlinearizable behaviours arising from running other cmd
s on already closed channels. This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS. As such, I'm more reluctant to invest in lifting it.
|
Out_channel.flush t : Ok (())
Out_channel.output_bytes t "7)\016)\170\245A{(2\255\000mr\185\243\206\2475-\192i\135\2216}w" : Ok (())
Out_channel.set_binary_mode t false : Ok (())
|
.---------------------------------------------------------------------------------------------------------------.
| |
Out_channel.length t : Ok (27) Out_channel.close_noerr t : ()
Out_channel.is_buffered t : Ok (true) Out_channel.close t : Ok (())
Out_channel.output_byte t 6 : Ok (()) Out_channel.pos t : Ok (65536)
Out_channel.is_buffered t : Ok (true)
Out_channel.output_bytes t "K\226" : Error (Sys_error("Bad file descriptor"))
Out_channel.is_buffered t : Ok (true)
Out_channel.output_substring t "\197\180\2484g\180\230\193" 4 7 : Error (Invalid_argument("output_substring"))
Out_channel.flush t : Ok (())
Our discussion also made me check my recollection regarding locking: https://github.com/ocaml/ocaml/blob/cedff5854ac91dccf5847b527192223ef506b1e2/runtime/io.c#L60
All operations on channels first take the channel lock.
As such, they Out_channel
operations should act atomically when used in parallel and be sequential consistent.
On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412
Do you think the STM tests are less likely to trigger such defects?
Possibly, but I'm unsure.
Lin
input will call the Out_channel
interface (up to) 50 times 1 parallel run #interleavings times.STM
input will call the interface (up to) 25 times * 1 parallel run.Since #interleavings can be quite high, a corner case bug is more likely to trigger more often when it is run more.
On the other hand, when considering this behaviour across a count
of 1000, the negative Lin
test will finish sooner (and report a "counterexample") whereas the current STM
test runs to completion (1000 inputs * 25 repetitions).
In 8236876 I extend the STM
tests to be able to generate close
and close_noexn
in state Closed
too.
This should have a possible sequentialization (below) which we seem to miss:
In the sequentialized version, the length
will most probably be 0, as the buffer is not flushed; the output we see is due to the close_noerr
being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?
This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.
To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the postcond
function, we’ll never test close
or close_noerr
in one of the parallel branches except when the other branch is only doing open
and close
calls, or am I mistaken in how postcond
is used in all_interleavings_ok
? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?
This should have a possible sequentialization (below) which we seem to miss:
In the sequentialized version, the
length
will most probably be 0, as the buffer is not flushed; the output we see is due to theclose_noerr
being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?
This is a probable explanation indeed! :+1: However it seems a bit unsatisfying that this stops testing and is reported as a counterexample of parallelism-safety.
This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.
To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the
postcond
function, we’ll never testclose
orclose_noerr
in one of the parallel branches except when the other branch is only doingopen
andclose
calls, or am I mistaken in howpostcond
is used inall_interleavings_ok
? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?
Point well taken - and yes, you are right :+1:
Following the Erlang version described in the ICFP'09 paper, STM
will generate candidate "Y"-triples and only keep those which satisfy all precond
itions in all interleavings (which is what all_interleavings_ok
should express). This indeed leaves out two parallel close*
cmd
s.
I've therefore taken a pass over the STM
test and extended the cmd
s so that all of them can also occur in state Closed
. The only generator and precond
limitation now is that Open_text
is not allowed in state Open
(this was also a limitation of the Lin
test).
This revealed a divergence from the spec as I shared offline:
val close : t -> unit
(** Close the given channel, flushing all buffered write operations. Output
functions raise a [Sys_error] exception when they are applied to a closed
output channel, except {!close} and {!flush}, which do nothing when applied
to an already closed channel. Note that {!close} may raise [Sys_error] if
the operating system signals an error when flushing or closing. *)
Yet output_string
, output_bytes
, output
, and output_substring
of length 0 on a Closed
channel does not fail, e.g.:
--- Failure --------------------------------------------------------------------
Test STM Out_channel test sequential failed (22 shrink steps):
Close
Output_string ""
+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Messages for test STM Out_channel test sequential:
Results incompatible with model
Close : Ok (())
Output_string "" : Ok (())
For now, I've adjust postcond
to accept these cornercases.
We should consider submitting a PR to adjust the specification though...
This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?
We can now experimentally confirm that the latest changes can indeed find issues like that.
All 5.2 and 5.3/trunk workflows show a regression on outputting on a closed Out_channel
, where only the first cmd
now triggers an exception: :open_mouth:
+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Messages for test STM Out_channel test sequential:
Results incompatible with model
Close_noerr : Ok (())
Output_string "}\169)gN\255\017" : Error (Sys_error("Bad file descriptor"))
Output_byte 48 : Ok (())
Some overdue CI summaries...
For 9c63209:
domain_spawntree
#428threadomain
#203domain_spawntree
#428domain_spawntree
#428dune
#433Out of 44 workflows 5 failed and 1 was cancelled, all 6 due to genuine issues
For 8236876:
domain_spawntree
#428domain_spawntree
#428domain_spawntree
#428domain_spawntree
#428dune
#433Out of 44 workflows 5 failed with all 5 being genuine issues
For 0c8fdcb:
threadomain
#203domain_spawntree
#428
STM
Out_channel
test with flush
raising Sys_error("Bad file descriptor")
when used in parallelSTM
Out_channel
due to the exception regression #432
Out of 44 workflows 24 failed with a total of 28 failures (4 workflows failed two tests) with all of them being genuine
The PR's latest test revision is so effective at triggering the #432 regression that we should consider temporarily relaxing the test to accept the current behaviour, to avoid having to check ~20 failing workflows on each CI run... :grimacing:
I rebased on main
to avoid more false alarms and added 86f2ab2 to temporarily disable the #432 regression reports.
CI summary:
Lin Dynlink
test #307dune
#433Out of 44 workflows 2 failed, with both being genuine issues
CI summary for merge to main
:
dune
#433Out of 45 workflows 1 failed with a genuine issue
The
Lin
Out_channel
test is a sore point of the test suite:Out_channel
s buffer internally, meaninglength
's result can vary. This is a bad fit forLin
's sequential consistency test, because we may end up finding counterexamples of different buffering (between a parallel and any interleaved sequential run) - or just spend a long time searching for one. As such, a model-basedSTM
test is a better fit - and this PR offers one.The new model-based test
Out_channel
module as the existingLin
testMany things thus speak for switching to it.
On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412
Technically, there are a few nuggets in the test code:
Out_channel
s with suitable preconditions. As such the test can generate open-close-open-closecmd
sequences.length
's output is handled by comparing less-or-equal-to the model's lengthlength
increases by 2 for every'\n'
output (since'\n'
maps to"\r\n"
)position
does not reflect this translation thoughset_binary_mode
: it may not be the mode enabled at output-call-time that takes effect (example below). This is solved by always callingflush
beforeset_binary_mode
on MinGW/Cygwin where this matters:Closes #378 and #401