observe produces wrong results and is not referentially transparent with ghc-8.6.1 and -O1 or -O2

I was just reading your paper and when reading about entangle and observe I thought that there was a good chance, that memorization would cause observe to use the same entangled value twice, thus giving false results.

Indeed I found the following program, where that seems to be the case:

{-# LANGUAGE GADTs #-}

import Test.StrictCheck
import Data.Foldable (traverse_)
import Data.Bifunctor (bimap)

main :: IO ()
main =
  (print . bimap prettyDemand (\(x :* Nil) -> prettyDemand x)) `traverse_` [
    observe (\() -> ()) (\() -> ()) (),
    observe (\() -> ()) (\_  -> ()) ()
  ]

I would expect the following output:

("()","()")
("()","_")

This matches the actual output for ghc versions 8.2.2 and 8.4.3 with any optimisation level as well as ghc-8.6.1 with -O0. However when compiling with ghc-8.6.1 and -O1 or -O2 the actual output is this:

("()","()")
("()","()")

Even worse, when we change the order of the two observe-calls in the list we get this:

("()","_")
("()","_")

I think the best way to fix this would be to change entangle to this safe (exportable) version:

entangle :: a -> IO (a, IORef (Thunk a))
entangle =   do
    ref <- newIORef Thunk
    return ( unsafePerformIO $ do
               writeIORef ref (Eval a)
               return a
           , ref )

This version of entangle should be referentially transparent and allows the user to observe the evaluation status of the entangled value at different points in time. Furthermore, and most importantly for this issue, leaving out the outer unsafePerformIO makes sure that two calls to entangle will create two different entangled values with two different IORefs, thus making bugs like this impossible. However this of cause would probably require a lot of code to be rewritten to accommodate for the new type and maybe there is a quick and dirty solution that somehow prevents memorization or even not exporting observe and making sure it is only used in safe ways.

NOINLINEing observe fixes the above test case. This would probably have to be done to all variations of observe.

Also I now think the “correct” version of entangle would be the following, as it prevents users from writing to the IORef and allows a definition of entangleShape of type Shaped a => a -> IO (a, IO (Demand a)). entangleShape :: Shaped a => a -> IO (a, IORef (Demand a)) on the other hand does not really make sense and would be much more complicate to implement at the least.

entangle :: a -> IO (a, IO (Thunk a))
entangle =   do
    ref <- newIORef Thunk
    return ( unsafePerformIO $ do
               writeIORef ref (Eval a)
               return a
           , readIORef ref )

Thank you very much for this error report! It seems like the CSE pass in GHC 8.6 has gotten more aggressive. You're certainly right that all the variants of observe ought to have been NOINLINE-ed, and doing this should fix the issue permanently. I'll get on that shortly.

In the meantime, here's a more minimal example (which I'll add to the test suite shortly) and some details:

import System.Exit
import Test.StrictCheck

main :: IO ()
main =
  let strict = prettyDemand (snd (observe1 id (\() -> ()) ())) in
  let lazy   = prettyDemand (snd (observe1 id (\_  -> ()) ())) in

  if strict /= lazy
  then return ()
  else die "observe1 is not referentially transparent"

When we compile this with -O0, it returns without error. With -O1, it errors out, as you would expect. However, with -O1 -fno-case, it is once more successfully. Here's how to replicate this:

$ ghc -fforce-recomp -no-keep-hi-files -no-keep-o-files -O0 -ddump-simpl Main.hs -o Unoptimized > Unoptimized.core
$ ghc -fforce-recomp -no-keep-hi-files -no-keep-o-files -O1 -ddump-simpl Main.hs -o Optimized > Optimized.core
$ ghc -fforce-recomp -no-keep-hi-files -no-keep-o-files -O1 -fno-cse -ddump-simpl Main.hs -o NoCSE > NoCSE.core

Looking at the generated Optimized.core, we can see the issue: there is only one instance of entangleShape in the entire core file—there should be 4 (or none, if there was no optimization at all).

Optimized.core (excerpt):

...
Main.main5 :: () -> ((), Demand ())
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
Main.main5
  = Test.StrictCheck.Observe.Unsafe.entangleShape
      @ () Test.StrictCheck.Shaped.$fShaped()
...

We then see this single value Main.main5 get re-used in three places, which causes the exact issue you observed. Whereas, without CSE, we see four separate instances of entangleShape.

Summary: Allowing observe to be inlined exposes entangleShape to CSE optimization, which can destroy referential transparency. Fix: add {-# NOINLINE #-} pragmas to all observe-family functions, and add regression tests to the test suite. I'm on it.

This bug fix has been published as version 0.1.1 on Hackage.

plaidfinch / StrictCheck

observe produces wrong results and is not referentially transparent with ghc-8.6.1 and -O1 or -O2 #2