tweag / cooked-validators

MIT License
39 stars 11 forks source link

More flexible and general attacks #112

Closed carlhammann closed 2 years ago

carlhammann commented 2 years ago

We have the beginnings of a domain specific language for attacks, and I'm thinking about how to extend it. These are mostly my ramblings, which I nonetheless want to share here; maybe you have comments and ideas.

At the moment, the entry point are the modalities somewhere and everywhere. While they are already a very good interface to turn "harmless" traces into "malicious" ones, they are not enough, for example, to check if a minting policy actually enforces NFTs: Look at the test for the token duplication attack, and you will see that the "careful" policy there only ensures that at most a certain amount of tokens is minted in one transaction. In order to write a test that tries to mint again in a second transaction, we presently need to manually write a trace with at least two transactions.

Another nice-to-have would be the ability to combine attacks conjunctively or disjunctively; think of a combinator somewhereA :: (TxSkel -> [TxSkel]) -> m a -> m a, such that:

somewhereA (\tx -> [f tx, g tx]) (validateTx a >> validateTx b) =~= 
  [ validateTx (f a) >> validateTx b
  , validateTx (g a) >> validateTx b
  , validateTx a >> valiateTx (f b)
  , validateTx a >> validateTx (g b) ]

(Thanks to @VictorCMiraldo for that idea!)

At the moment, there are three ideas floating around in my head:

  1. "Simply" write a few combinators like somewhereA. Note that the functionality of somewhereA could also be accomplished by writing two somewhere-test cases on the same trace.
  2. Change the types of somewhere and everywhere to take something more general than Maybe.Again somewhereA kind of suggests a semantics for any Foldable...
  3. (my preferred one) Try to add more "modal logic" operators to MonadModal, so that we can write somewhere f `or` somewhere g to get the functionality of somewhereA, or even somewhere attackThatStealsAToken `followedBy` somewhere attackThatPaysTheToken, where followedBy is some sort of "implication that also knows about the order of transactions" in our logic.

What do you think?

serras commented 2 years ago

I love option (3)! In that direction, I think we need to decide how to represent those modalities; let's me explain what I mean by that. Right now the type of both somewhere and everywhere is:

(Action m -> Maybe (Action m)) -> (m a -> m a)

So if we want to have or (somewhere f) (anywhere g), my question is: is there a way to embody m a -> m a with the right structure to make that thing happen? For example, I could imagine that if m is an Alternative, then we could say things like:

or p q = \a -> (p a) <|> (q a)
and p q = \a -> q (p a)

If we go this path we could even introduce a new element in MonadModal,

next :: m a -> m a

and then define

followedBy p q = \a -> and (p a) (q (next a))

Ok, end of the first rant.

The second proposal comes from the fact that as I see in Staged, we are somehow embedding this modalities and then interpreting them. Why not then go the full route and introduce a sublanguage for these formulae?

data Prop a = Somewhere ... | Everywhere ... | Next ... | And (Modality a) (Modality a) | ...

and then ask implementors of MonadLogic to implement a single

modal :: Prop a -> m a -> m a
VictorCMiraldo commented 2 years ago

@serras that is a lovely idea and, indeed, it is the direction that @carlhammann is exploring. The difficulty comes from writing the interpretation function.

Intuitively, a value of type StagedMockChain represents a set of traces, where each trace can be thought of as a list of transactions. But the story is a little more complex semantically: a trace is actually a "telescope" of transactions in the sense that transaction n + 1 can depend on the state (or result of) after transaction n. This already brings a few questions:

  1. Are causal stream functions, seen as the behavior of the validateTx :: Tx -> MockChainSt -> (TxId, MockChainSt) mealy machine [thm 2 of Coalgebraic Logic and Synthesis of Mealy Machines, or exercise 2.3.2 (pdf page 34) of B. Jacobs]) enough to provide a rich denotation? I have a hunch that yes but I'd like to spend some more time thinking about this to conclude anything. Nevertheless, answering this question is important to have something to base ourselves off to answer the next question:
  2. What is the meaning of modal f (modal g m) ?
serras commented 2 years ago

... a trace is actually a "telescope" of transactions in the sense that transaction n + 1 can depend on the state (or result of) after transaction n.

This is expected, right? I assume that the goal of the library is to be able to explore situations in which the transactions are not independent, and thus some kind of buggy behavior can be triggered.

serras commented 2 years ago

On a different note, I've been having a look at this old (30-year-old!) Conjunction as composition paper, and there's an interesting idea there: their sequences include both actions and markers. The reason I'm bringing this up is that it seems to be that here we are having a bit of a problem defining what next is, or in general how the transactions really become a sequence, since we just combine them with >> or >>=.

What if we had a specific way to mark a block as being an entire transaction (that's what they do in the paper)? Or a way to state: the transaction is now done, everything from this point on is a new transaction in the sequence? More concretely, add a new constructor to the Free construction,

Next :: Free m a -> Free m a

which just "tags" the passing of sequence. Our interpretation of "finishing the transaction" would then introduce that Next element.

carlhammann commented 2 years ago

I had a few thoughts about Modify over the weekend, which also probably go towards the problem @VictorCMiraldo alluded to when he asked

What is the meaning of modal f (modal g m) ?

My mental model at the moment is that at each time step, we have a modification to be applied right now and a formula that the next time step must satisfy (i.e. that is used to modify the remainder of the computation). This makes it difficult to think about Modify:

  1. Assume that we already are in the middle of a Modifyed computation, and encounter another Modify. How should we combine the current formula with one from the new Modify?
  2. Likewise, when we're exiting from a Modifyed "block", maybe there is still a non-trivial "next step"-formula. (How) Should we apply it to the computation after the Modify? (How) Should the formula before the Modify have changed by now?

I suggest not trying to answer these questions just yet, but to either change the type of Modify to be

Modify :: (Ltl (TxSkel -> Maybe TxSkel) -> Ltl (TxSkel -> Maybe TxSkel)) ->
          (Ltl (TxSkel -> Maybe TxSkel) -> Ltl (TxSkel -> Maybe TxSkel)) ->
          StagedMockChain a ->
          MockChainOp a

(Ltl a is the type of LTL formulae with "propositional variables" of type a), where the first Ltl (TxSkel -> Maybe TxSkel) -> Ltl (TxSkel -> Maybe TxSkel) function answers question (1.) above and the second one answers question (2.). Another option would be to drop Modify in favour of

ChangeFormula :: (Ltl (TxSkel -> Maybe TxSkel) -> Ltl (TxSkel -> Maybe TxSkel)) -> MockChainOp ()

which has the intended meaning that it comes between time steps and changes the formula that the next time step has to satisfy in the prescribed way. I think the second option is better, because it is simpler while not loosing much generality and also acknowledges the forest structure of the computations described by StagedMockChain: This is a bit wishy-washy, but having a changed formula be "active from now on" feels more tree-like, while modifying blocks feels more like were talking about directed acyclic graphs (which we are not).

It should probably be noted that both suggestions allow the implementation of more specialised idioms that do give an answer to questions (1.) and (2.), and would probably be what people would use most of the time.

serras commented 2 years ago

I haven't put more than 15 minutes on this, but I feel like this is way more complicated than what you showed me the other day. If I remember correctly, your semantics of Modify p (Modify q t) were equivalent to Modify (p && q) t, which seem quite sensible to me. I could also imagine the semantics of modal f (modal g m) to be equivalent to modal (f && g) m in a similar fashion. This is all about point (1), I don't fully understand (2) yet.

In any case, I think that this is approach is maybe bringing the cannons too early, having so much generality. I would prefer some kind of DSL which says what's permissible, instead of a fully-generic thing.

By the way, another though I had these past days is that maybe Modify is "too exposed" in the current API, and that might be where the problem is coming from. Are we even allowing users to write their Modifys? I think we shouldn't, and it be just a mechanism used internally by modal to apply the modifications by the Ltl formula. This might also be a way to work around those hard questions, since we never would come to a series of Modifys in first instance.

carlhammann commented 2 years ago

I guess a special case of problem (2.) might help. The most obvious translation (and the one I showed you on Friday) of the Modify we have now would be

Modify :: (Ltl (TxSkel -> Maybe TxSkel)) -> StagedMockChain a -> MockChainOp a

What modification should be applied to f here:

do ...
  Modify LtlFalsity ( ... -- some operations that do not correspond to time steps,
                          -- and should therefore not affected by 'Modify'
                    )
  f
  ...

In this case, it is "clear" that we do not want LtlFalsity to be applied to f (which would lead to the computation failing), so upon exiting the Modifyed computation, we want (some version of) the formula that was active before Modify to be restored. How do we keep track of which formula to restore in nested Modifys?

serras commented 2 years ago

How do we keep track of which formula to restore in nested Modifys?

My initial answer to that would be that this is a task for modal, the interpretation function, not for MockChainOp. At the end of the day what we want to it to have a function such to do:

modal ltlFormula trace

and get back a modified trace of some sort. My fear is that we are diverging from that end-goal by overloading too much what "trace"s can do. Could we get that function running in other ways? Note that I might be looking at this from this very narrow point of view, and there's something I'm missing.

carlhammann commented 2 years ago

Of course, my code above should have used modal instead of Modify, but if we look here, we see that, somewhere and everywhere (which are what modal would generalise), are implemented in terms of Modify. (That's the whole point of the Modify operation: To be the implementation of modal)

But I agree that it's a valid question to ask if we need nested modals at all; the useful part is to have modal after a few initial "information gathering" transactions.

VictorCMiraldo commented 2 years ago

I think we need a semantics before we even attempt to answer anything. If the semantics we agree on is that a value tr :: StagedMockChain a represents a set of traces for MockChain; then modify f tr represents a set of traces modified by f; hence, modify g (modify f tr) is equivalent to modify (g `after` f) tr for some after; that is, first we modify the traces in tr to apply f; then we modify the resulting set of traces to apply g.

modify (somewhere f) (modify (somewhere g) (txA >> txB))
  == modify (somewhere f) { g txA >> txB , txA >> g txB }
  == { f (g txA) >> txB , g txA >> f txB , f txA >> g txB , txA >> f (g txB) }

The difficulty here comes from the fact that we cannot compute modify f tr, then concatMap (interpModify g) over that: we need to modify as we interpret. Hence, maybe we can define the after combinator and combine the formulas. My intuition tells me that after should distribute over propositional operators; compose on atoms and needs to do some black magic with LtlUntil and LtlRelease... I need to think this through a bit more. Maybe a few trains and my flight today help.

VictorCMiraldo commented 2 years ago

To answer whether we need nested modals; well... it depends. As it currently stand, we have one monad: StagedMockChain, and attacks can depend on previous values:

attackTrace = do
  someHash <- setupTrace
  someOut <- scriptUtxosSuchThat ...
  somewhere (attack someHash someOut) $ do
    traceWhereAttackMightWork

Hence, the AST for StagedMockChain needs a Modify constructor. We have the option of crashing, at runtime, if we ever find a second one, which might be a pragmatic compromise

carlhammann commented 2 years ago

I think we need a semantics before we even attempt to answer anything.

I agree that we have to talk semantics in a more principled way (and that's definitely what I'll be working on this week). Still, I'd like to have something to play around with concretely, if only to build my own intuition.

If the semantics we agree on is that a value tr :: StagedMockChain a represents a set of traces for MockChain [...]

I agree again, and I think it'll be important to take into consideration the (forest) structure of that set. For example, I would expect that this holds:

modify (somewhere f) (txA >> modify (somewhere g) (txB >> txC))
  == modify (somewhere f) (txA >> {g txB >> txC, txB >> g txC})
  == { f txA >> g txB >> txC,
       txA >> f (g txB) >> txC,
       txA >> g txB >> f txC,

       f txA >> txB >> g txC,
       txA >> f txB >> g txC,
       txA >> txB >> f (g txC) }

and that this approach should yield the same result:

modify (somewhere f) (txA >> modify (somewhere g) (txB >> txC))
  == modify (f `or` next (somewhere f)) (txA >> modify (somewhere g) (txB >> txC))
  == f txA >> modify (somewhere g) (txB >> txC) <|>
     txA >> modify (somewhere f) (modify (somewhere g) (txB >> txC)
  == ...

This'll probably the case with the right after, so many thanks for that idea, @VictorCMiraldo!

Maybe this is a good place to illustrate my problem (2.) with another example: Should we have

modify (somewhere f) (modify (somewhere $ g `and` next h) (txA >> txB) >> txC)
  == modify (somewhere f) (g txA >> h txB >> txC)

or

modify (somewhere f) (modify (somewhere $ g `and` next h) (txA >> txB) >> txC)
  == modify (somewhere f) { g txA >> h txB >> txC,
                            txA >> g txB >> h txC }

I think it should be the second one, but I could understand if someone thinks otherwise. That's why I proposed to abandon modify in favour of something like changeFormula, where such disagreements are impossible. Maybe the next step for me is to try and see which obstacles I find when I try to implement a changeFormula that uses a suitable after.

carlhammann commented 2 years ago

I think I can prove that, if we're only interested in finite computations, after should just be conjunction (under some very mild conditions on after).