tweag / pirouette

Language-generic workbench for building static analysis
MIT License
47 stars 2 forks source link

How to deal with strict evaluation when arguments evaluate to bottom? #137

Open carlhammann opened 2 years ago

carlhammann commented 2 years ago

Here's a problem @VictorCMiraldo and I discovered while discussing improvements to Priouette's symbolic evaluation.

Consider calling f x y z = x == z with arguments like f 42 (error "blah") 10. Under strict evaluation (read: in Plutus), that function call would fail, but Pirouette currently inlines everything, which means that such a failure won't be detected.

In the incorrectness logic setting that means false positive counterexamples: Pirouette might think that the validator f returnsTrue in situations in which it actually fails, even before it is called.

This issue is related to symbolic evaluation in the following way: We are thinking about symbolic execution into two phases, one function

ana :: Term -> Tree a

that takes a term and builds a (potentially infinite) tree corresponding to the possible unfoldings of the given term, and a function

cata :: Tree a -> [a]

to traverse that three and return the desired results. The tree data type should be essentially of the form

data Tree a
  = Branch [Tree a]
  | Leaf a
  | Learn Constraint (Tree a)
   ...

so that we can write, for example (in pseudo-Haskell)

ana [| case x of A -> u; B -> v |] = do
  res <- ana x
  Branch [ Learn (res := A) (ana u), Learn (res := b) (ana v) ]

The important observation here is that ana has made progress, but how do we make progress for something like ana [| f x y z |] without either enforcing lazy evaluation or descending in a potentially infinite recursion? Which additional constructor(s) of Tree should we have?

VictorCMiraldo commented 2 years ago

As discussed earlier today with @carlhammann, we have a hunch that maybe the Tree type should capture the application structure, and it is up to the consumer to decide how to "combine" things. For instance, add the following constructor to Tree a (in pseudo-haskell)

data Tree a = ...
   | Call (Vec n (Tree b) -> Tree a) (Vec n (Tree b))

When traversing this tree and with the intention of producing a list of results, we're met with two options:

cataA :: Tree a -> [a]
cataA (Call f args) = cataA (f args)

or

cataB :: Tree a -> [a]
cataB (Call f args) = cataB $ do
  args' <- distr args -- distr :: [Tree a] -> Tree [a]
  f $ map Leaf args'

The implementation of cataA above would correspond to call-by-name whereas cataB would do call-by-value. Importantly though is that we get an option to fine tune how to prune each argument to prevent it from looping, all we have to do is distr $ map prune args instead.

In ana, the definition of a call would look something like:

ana :: Term lang -> Tree (Term lang)
ana (SystF.Free hd `SystF.App` args) = Call go (map ana args)
  where
    go :: [Tree (Term lang)] -> Tree (Term lang)
    go targs = do
      body <- defOf hd
      -- Here's the trick! Rely on metavariables to apply the body of hd to a bunch of trees;
      -- then do some distributive magic to get a tree of possibilities.
      res <- join $ body `appN` (map SystF.Meta targs)
      ana res

    join :: TermMeta (Tree (Term lang)) lang -> Tree (Term lang)
VictorCMiraldo commented 2 years ago

That idea seems to be a reasonable way forward. Here's a prototype: https://gist.github.com/VictorCMiraldo/1dd50d7cde68dcdb8f37fc440a91ee29

VictorCMiraldo commented 2 years ago

While continuing to think about this, it seems like we'll need to be smart for destructors too and my prototype wasn't complex enough to witness the issue. Here's the culprit on the gist from [this comment]():

    go If [c,t,e] = do
      res <- c
      Branch
        [ Learn ("isTrue: " ++ show res) $ t,
          Learn ("isFalse:" ++ show res) $ e
        ]

Now, let's pretend that instead of evaluating an if-statement (which is the destructor for Bool), we're evaluating a case statement over lists. The res <- c part will loop since lists can be infinite! What we really want is something that says evaluate 'c' until its WHNF then move on. What is WHNF for us? A constructor at the head of an SystF.App. Maybe something like:

data Tree a
   = ...
   | forall b . Dest (Tree b) (WHNFConstructor (Tree b) -> Tree a)

Now it is up to the cata to explore the motive enough until it finds a WHNF constructor, bring it into evidence, then move on.

EDIT:

Actually, the whole part on expanding a motive that is a metavariable into all of its possibilities should be handled by the ana. For it to be a truly infinite list, if it sees it is returning a metavariable, it should expand it into its constituent cases instead

carlhammann commented 2 years ago

Is evaluating to WHNF enough? That is, don't we have a similar lazy vs. strict problem like we had in the case for function application (the destructor of the function type) with, say, a list like [error "boom"]?

VictorCMiraldo commented 2 years ago

Is evaluating to WHNF enough? That is, don't we have a similar lazy vs. strict problem like we had in the case for function application (the destructor of the function type) with, say, a list like [error "boom"]?

Not really, because the cata get's to decide up until when to evaluate the arguments. We could do something like:

cata (Dest motive cases) =
  let motive' = cataToWHNF motive
   in case treeFindUntilDepth 3 motive' isError of
        Nothing -> cases motive'
        Just err -> throwError

That would be some sort of hybrid strategy, that looks just a little deeper down the arguments to look for errors, then moves on with call-by-name

VictorCMiraldo commented 2 years ago

As it turns out, this issue is much more involved than it looked. There's a difficult balance to be achieved by having destructors and application. I came up with a more involved prototype that is looking a little more promising, it can be found here: https://gist.github.com/VictorCMiraldo/ee142cf46efb1aeeb4fd22be54810fac