runtimeverification / haskell-backend

The symbolic execution engine powering the K Framework
BSD 3-Clause "New" or "Revised" License
208 stars 41 forks source link

Proposal for representing sort injections #3798

Open ana-pantilie opened 1 year ago

ana-pantilie commented 1 year ago

@goodlyrottenapple says:

we could perhaps get rid of them at the Term level and instead push them into the Sort type? I think this would make code like the matcher and unifier easier to read,

This is an intriguing idea! :+1: I also believe it will be easier to write the unifier/matcher if we get it right (see below) Might be worth its own issue, as we need to think this through a bit more. Writing down some ideas here for now.

we would only be matching on actual structure, such as constructors or values and we could handle the matching/unification of the types of both terms in a separate function. In theory, it could be as simple as having [Sort] in Terms, where e.g.

DomainValue [s1,..sn] x ~ inj{s2, s1}(inj{s3,s2}(...inj{sn, sn-1}(\dv{sn}(x)) ...))

Originally posted by @jberthold in https://github.com/runtimeverification/hs-backend-booster/issues/4#issuecomment-1301485695

ana-pantilie commented 1 year ago

@ehildenb and @dwightguth , thoughts?

ehildenb commented 1 year ago

@ana-pantilie I really recommend you do not try to deal with sort injections yet. For 2 reasons:

So I would propose this approach:

This allows our new unification algorithm to be "safe", while making a minimal attempt to deal with sort injections. If we find that this is not enough, then we will be able to collect data about how often we are falling back to the old backend due to sort injection issues.

ana-pantilie commented 1 year ago

then treat it like a free constructor

I guess that means we should have a separate node in the AST for it, at least for now, right @goodlyrottenapple @jberthold ?

goodlyrottenapple commented 1 year ago

maybe not even necessary for now if we treat them as free constructors?

jberthold commented 1 year ago

To re-iterate the in-person discussion from yesterday: What we should think about at some point is what it means to "short-cut" the injections. E.g.., if you have Int < IOInt < KItem and a term inj{IOInt, KItem}(inj{Int, IOInt}(42)), IIUC the term construction would be cutting out the inj(Int, IOInt) and we have inj{Int, KItem}(42). Now if you have rules that match on things containing IOInt, for the rule to apply it would have to accept the Int as an IOInt How would the subsort relation Int < IOInt be recovered?

ehildenb commented 1 year ago

I'm pretty sure injections can be treated as total function symbols basically. They have the properties: single-valued, injective, and total. In particular, total functions satisfy weaker properties than that: single-valued and total. Understanding that they satisfy the same axioms as total functions, lets us know immediately that:

User supplied simplification rules may not be performant enough long-term, but we don't know yet. For a subsort chain A < B < C, we could have the frontend (or user) supply the needed simplification rules for expanding/contracting the subsort chains.

This is why I'd lean towards not having special representation or treatment of them initially, similar to how we are not treating Maps specially.

I think the biggest problem we'll run into is that the unifier will not be returning the correct results, or the best results, in some cases. I think this means we need to adjust our rewriter to not only check that the returned unification problem is a matching substitution, but also needs to return a well sorted substitution. That is, for each unificand { Vi #Equals Ti }, we should have that Sort(Ti) <= Sort(Vi), where we maintain the subsort relationship table elsewhere (and compute it quickly once at load time). We should probably also update each unificand with the necessary sort injections that we discover. This seems like a very low-overhead way to work with injections in our current design.

That modification should allow us to handle the most common case, which is where you have a single variable in a rule which is a broader sort than the subject it's matching (so it catches lots of sub-cases), and the actual term is a narrower sort.

ehildenb commented 1 year ago

The other common case, of course, is the complete opposite. You have a variable of a more specific sort than the term it's matching, but it's still a matching substitution. The well sorted check will correctly filter out that possibility.

ehildenb commented 1 year ago

The more I think about it, the more I think we just should not send terms with injections to the new backend, and it should work in the order-sorted setting instead of the many-sorted setting.

Maude definitions must satisfy the "preregular" property (see Section 3.8 of the Maude Manual), which is basically the most simple sanity check on the signature supplied by the user which ensures that the order-sorted-ness behaves well. Preregularity is only really a problem if you have operator overloading.

So the new rewriter can check for overloading first, and reject any definitions that have overloading. Overloading is not a commonly used feature, and indeed it's tricky to use and get correct in K definitions. It may be that some K definitions get nicer if we have better overloading heirarchy, but it has not been needed yet. KWasm is one example of a semantics that uses overloading, I think it can be worked around there with a little rule duplication.

Order-sorted is not much more complex than many-sorted, and as long as you validate that all terms that come in (and rules/definitions) are well sorted, you can largely ignore the order sorted component. When doing unification, you need to check that the substitutions are well sorted, and fast unification algorithms that handle order-sorted definitions exist (eg. Maude, which can be used as a library over FFI to do unification problems for us if we get to the point where we decide that AC unification is needed for the fast rewriter).

The 2 things about K definitions is that:

So basically what I'm proposing is that when we feed Kore into this backend, we don't insert sort injection nodes, instead just interpreting the Kore term as an order-sorted term. I think because we're only trying to handle the 90% case, we'll find that this does not limit us severly and makes us not have to worry about injections at all. It is just a few lines of code to change the pyk library to emit this order-sorted Kore instead of many-sorted Kore (and to have it able to emit both, so it can still speak to the old backend), and this backend can strip all the injections from the incoming definition when it loads it.

Adding support for overloading later is also not difficult I don't think, but it does incur more sort and subsort checks during unification I believe.