hedgehogqa / haskell-hedgehog

Release with confidence, state-of-the-art property testing for Haskell.
677 stars 107 forks source link

Is linearize correct? #104

Open Danten opened 7 years ago

Danten commented 7 years ago

Hello so I had a look at Hedgehog.Internal.State.linearize and I'm not entirely sure this checks if the history is linearisable. Consider the following histories (assume we have read and write commands:

1: [ write 0 ] [ write 1]
2:     [ read -> 1]

In this case the we see that the second process overlaps with the first and we get the following interleavings

1) write 0, write 1, read -> 1 2) write 0, read -> 1, write 1 3) read -> 1, write 0, write 1

All consistent with what is implemented in Hedgehog, but suppose the history is like this:

1: [write 0]              [write 1]
2:        [read -> 1] 

There is only two possible interleaving (since there is only one overlap) which is: 1) write 0, read -> 0, write 1 2) read -> 1, write 0, write 1

The write 1 starts after read -> 1 ends and can therefore not be commuted before. This is the reason why we in the quickcheck-state-machine library have the two branches write to a common log with when each command starts and when it ends. Which might sound a bit abstract, but basically the sequential history can't have a command c0 before command c1 if c1 finished before c0 started.

Now it is entirely possible that we have misunderstood something, but this makes some sense to me: basically a history is linearisable if we can place it in sequential order such that the time of every command happens after their real start time and before the end time of the commands.

jacobstanley commented 7 years ago

I implemented the linearization algorithm as I understood it from reading Finding Race Conditions in Erlang with QuickCheck and PULSE and there was no mention of checking the actual time that things happen.

Having said that, I think it's a great idea! It appears to have two positive effects, cutting down the search space so you can run larger tests and also rejecting programs which would have succeeded before.

Thanks for describing your approach, I might look at implementing it at some point. Did you read about this in the literature anywhere?

Do you really need a common log? It sounds like you could achieve the same thing by using an IORef Int or an AtomicCounter which you increment at the start and end of every common execution to obtain the current "time". You can then use that number to reconcile the ordering later. The downside of this is that you might introduce additional synchronization between the two parallel branches which makes bugs stop happening as the threads aren't free to run independently, although I'm not convinced that isn't also happening with your channel implementation.

Danten commented 7 years ago

So it was mostly @stevana that looked at the literature, but I think this is Linearizability: A Correctness Condition for Concurrent Objects is where we got the definition from.

An Int that keeps incrementing should be enough (if it increments in a race-condition free way ;) ) and yes we will probably also have some additional synchronization happening.

stevana commented 7 years ago

Given that the threads run on the same machine, perhaps simply using a timestamp could suffice?