namin / clpsmt-miniKanren

CLP(SMT) on top of miniKanren
MIT License
35 stars 8 forks source link

The current state of `clpsmt-miniKanren` #9

Open chansey97 opened 3 years ago

chansey97 commented 3 years ago

There are two main repos of miniKanren's SMT hook: clpsmt-miniKanren and smt-assumptions

These two repos seemly have their own advantages and restrictions.

  1. smt-assumptions uses check-sat-assuming to improve performance and supports constraints promotion.

    However, its the variable' domain seems only supports number? currently?

    BTW, I noticed that the variables' domain (FD) in faster-miniKanren is really tricky.

    As @michaelballantyne said in his comment:

    Each type must have infinitely many possible values to avoid ; incorrectness in combination with disequality constraints, ; like: (fresh (x) (booleano x) (=/= x #t) (=/= x #f))

    I hope the SMT hook can work around this restriction.

  2. clpsmt-miniKanren is slower than smt-assumptions and does not support constraints promotion, but it can support more variable's domains (e.g. Boolean, bit-vector, Set, even ADT).

    PS. smt is basically the same as clpsmt-miniKanren, except that using faster-miniKanren to improve performance.

My question is that

  1. Does this repo still under development or has done?

    As you said that

    So I think you'd want to migrate to faster-mk eventually. There's no point in porting smt-assumptions back to canonical mk.

    So ideally any patches/examples of clpsmt-miniKanren now should be committed to faster-miniKanren repo (e.g. smt or smt-assumptions) instead of clpsmt-miniKanren. Is that right?

    Another way is replacing the canonical miniKanren implementation of clpsmt-miniKanren with faster-miniKanren.

  2. Why smt-assumptions/smt use a remote link to clpsmt-miniKanren?

    They must be hardcoded by path name explicitly, e.g.

    (load "../clpsmt-miniKanren/z3-server.scm") or (require "../faster-miniKanren/mk.rkt").

    Is it intentional?

    IMO it's very inconvenient. Shall we move all the SMT-server code to faster-mk version?

  3. What do you think about the above restrictions of clpsmt-miniKanren and smt-assumption ?

    Is it necessary to fix these restrictions?

    For example, merging constraints promotion to smt or make the smt-assumption support more variable's domain?

  4. Are the clpsmt-miniKanren/smt and smt-assumption both useful?

    I do not quite understand

    The changes in smt-assumptions are not so important.

  5. Is there any next plan about this project?

    I can help.

Thanks.

namin commented 3 years ago

Thanks for your thoughtful issue.

  1. constraint promotion: I think it's worth exploring. Are there any difficulty for extending it to types other than numbers?

did you find it educational to read the canonical implementation before diving into faster-miniKanren? if that's the case, maybe we should keep it even though we would always use faster-miniKanren? I think it's a bit premature to pick one implementation, so it's good to have them in separate branches in a separate repo? Eventually for benchmarks, we might want to try a few implementations. I guess we could all have them in one repo under different names, but that seems unwieldy too.

  1. Yes, that's annoying to have to load files from different project. I wouldn't move server to faster-mk. I just want to keep the core stuff specific to an implementation strategy in faster-mk.

  2. At this point, you could create a new branch on faster-mk, starting from smt to explore constraint promotion.

  3. I explained 'not so important' in the other thread. I meant, it's not clear that doing the assumption strategy is an improvement.

  4. Thanks for your enthusiasm and willingness to help! It's so great to you have on the project and a real chance to move forward. I'll write another comment about the next plan.

namin commented 3 years ago

What do you think? Any ideas on your side?

Thanks!

namin commented 3 years ago

It's worth also looking at https://github.com/michaelballantyne/faster-miniKanren/commits/smt-numsonly by @michaelballantyne

michaelballantyne commented 3 years ago

Hi! Glad you find this interesting. :) Let me start with the things I was thinking about when we left off work, and then I'll get to your specific questions.

Interacting with the solver

We were exploring the best way of interacting with the solver. smt-assumptions was Nada's version of using check-sat-assuming, but its code was different enough than the basic implementation that I was worried we weren't comparing correctly. To attempt an apples-to-apples comparison of different approaches I created the smt-numsoly branch (https://github.com/michaelballantyne/faster-miniKanren/tree/smt-numsonly), which supports parameters that configure several modes:

I don't have recorded benchmark runs, but the notes I wrote suggest that reset-as-push-pop provided a significant gain vs naive, but push-pop and assumptions didn't provide much further benefit. From our work on the smt-assumptions branch we had previously thought that using check-sat-assuming was useful, but I think we had fooled ourselves. The real effects were:

  1. (reset) is more expensive than (pop) (push)
  2. At least at the time, Z3 solved the queries we were experimenting with faster when forced into incremental mode, which avoids certain preprocessing steps that were not helpful in our queries. Using (push) and (pop) forces incremental mode.

However, our collaborator Aaron Bembenek found that the check-sat-assuming strategy was useful in his datalog system, as documented here: https://people.seas.harvard.edu/~bembenek/papers/datalog-incr-smt-iclp2020.pdf His queries involved a variety of different SMT theories; for many check-sat-assuming worked best. A cluster of queries using LIA and arrays worked better with (pop) (push) though. I think our queries were LIA, so our data lines up with his. So I think if we expand our support for additional theories we may in the future get some benefit from check-sat-assuming.

It would be useful to benchmark these options more thoroughly as we explore extensions.

Integrating equalities, disequalities, and types

As you point out, giving the solver information about equalities and disequalities improves its ability to refute certain queries.

The integration is a little tricky because miniKanren treats variables as dynamically typed, whereas SMT-lib is statically typed. Types for SMT variables must be provided at variable declaration time. If at

(fresh (a b)
  (== a b))

we want to provide the equality to the solver, we need to declare the variables. But if we support multiple theories (say, bitvectors in addition to numbers), we haven't yet seen a constraint that determines the type of the variable, so we don't know the right type to declare the variable with! I believe this is why smt-assumptions and smt-numsonly only support integers.

I imagine solving this problem by accumulating equality and disequality constraint information and only sending it to the solver after a constraint that determines a particular type has been applied. Things like <= would need type-specific variants to ensure we know the type from the domain: int<= and real<=. We'd also want integero and realo rather than numbero. (Or support all three and implement subtyping...).

I think it would work to rely on the solver to handle the interaction between disequality and finite domains. A query asserting that three variables are pairwise disequal would succeed; adding additional constraints indicating that all three variables are of an algebraic data type with only two constructors would lead to failure when the constraints get checked by the solver.

As an alternative we could create a statically or gradually typed miniKanren so that we have type information to correctly declare types to the solver even before reaching theory-specific constraints.

My angle on your questions

  1. Does this repo still under development or has done?

We stalled because we hadn't yet found applications that we thought would be compelling to people beyond the miniKanren community: where might this approach be more effective than a classic constraint logic programming system, or a constrained horn clause solver?

As one such application, we had hoped that the combination of miniKanren's unique search and SMT constraints would allow us to solve example-based program synthesis queries for programs that manipulate things like numbers and arrays. The Barliman repository has branches corresponding to the faster-minikanren ones where we ported over the various implementations to Barliman's version of miniKanren. From my notes it looks like we weren't able to get Barliman with SMT to synthesize the full implementation of factorial. The tests we were exploring are here: https://github.com/webyrd/Barliman/blob/smt-numsonly/cocoa/Barliman/mk-and-rel-interp/test-interp.scm Barliman is a bit finicky though, so I'm not sure that the problem was the performance of SMT solving. This could use some more digging. Or, you might have other applications in mind?

Of course, a clean and complete implementation would definitely be of value to folks already interested in miniKanren regardless of whether we find uses of interest to a broader audience.

  1. Why smt-assumptions/smt use a remote link to clpsmt-miniKanren?

I think this is an artifact of the development process; Nada started in clpsmt-miniKanren and then wanted to try a faster-minikanren version. The clpsmt-miniKanren repository has all of the examples that we were trying across the original version based on canonical minikanren, the faster-minikanren branches, and the barliman branches.

  1. What do you think about the above restrictions of clpsmt-miniKanren and smt-assumption ?
  2. Are the clpsmt-miniKanren/smt and smt-assumption both useful?
  3. Is there any next plan about this project?

Starting further work from a copy of smt-numsonly would let us continue to explore the different modes of interacting with the solver as we use new theories. It would be great to extend that implementation to support all the SMT-LIB datatypes and constraints we can. Then it could subsume the other implementations.

It's nice to have a separate repository for this project, but it is a bit of a pain to keep everything in sync. @namin, would it make sense to set up a branch of faster-minikanren as a submodule of this repository?

I haven't looked into it too deeply, but it might be possible to have a single smt.scm used by both the faster-minikanren and the canonical miniKanren versions, if we want to have both.

chansey97 commented 3 years ago

Thanks for your great answers ! @namin @michaelballantyne .

I have a lot of things like to write, but they are not well organized yet.

I will add it later, sorry.

chansey97 commented 3 years ago

About "Integrating equalities, disequalities, and types"

This section gave me a lot of inspiration, thanks.

I was thinking that it seems not a big problem if minikanren with SMT does not support constraints promotion?

Take diseq-3 as an example, suppose the users know how the clpsmt-miniKanren implements, he may replace (=/= a b) with (z/assert (not (= a b))) manually:

(run 1 (q)
    (fresh (x y)
      (z/ `(declare-const ,x Int))
      (z/ `(declare-const ,y Int))
      (z/assert `(not (= ,x 0)))
      (z/assert `(not (= ,y 0)))
      (z/assert `(= 0 (* ,x ,y)))
      (== q `(,x ,y))))

Then no problem to refute this query.

So users seemly need to learn some technique for working around this restriction...

In the example (diseq-3),

(run 1 (q)
  (fresh (x y)
    (=/= x 0)
    (=/= y 0)
    (z/assert `(= 0 (* ,x ,y)))))

Since the user has used * in z/assert, he must know x and y be an Int in advance (Just ignore subtyping temporarily).

So he must declare (z/ `(declare-const ,x Int)) and (z/ `(declare-const ,y Int)) in some place.

Then since x and y are Int and 0 is a Int, he must replace (=/= x 0) and (=/= y 0) with (z/assert `(not (= ,x 0))) and (z/assert `(not (= ,y 0))).

The result is the same as the above.

Note that, for =/=, the type of x is irrelevant to the type of another argument, e.g. 0.

For example, suppose the original program is

(run 1 (q)
    (fresh (x y)
      (=/= x 'a)
      (=/= y 'b)
      (z/assert `(= 0 (* ,x ,y)))))

The only way to rewrite is

(run 1 (q)
    (fresh (x y)
      (z/ `(declare-const ,x Int))
      (z/ `(declare-const ,y Int))
      (=/= x 'a)   ; no effect, can be removed
      (=/= y 'b)   ; no effect, can be removed
      (z/assert `(= 0 (* ,x ,y)))))

They look a bit like type inference.

For a simple program, user can manually rewrite it, but for a larger program, it will be more complicated (Imagine that here 'a is another logic variable A, can we safely rewrite (=/= x A) to (z/assert `(not (= ,A 0))) ?).

Therefore, I think, although the clpsmt-miniKanren / smt has no the "constraint promotion", but it does not lose the ability to refute certain queries (It just leaves the complexity to users, e.g. may need using numbero, even tags for some generalized relations.)

In contrast, although smt-assumptions supports "constraint promotion", it is more complicated for implementer.

As @michaelballantyne said,

The integration is a little tricky because miniKanren treats variables as dynamically typed, whereas SMT-lib is statically typed. Types for SMT variables must be provided at variable declaration time.

So is the "constraints promotion" really necessary?

What do you think about it?

PS. The "constraint promotion" is the term that I invented myself. I don’t know whether there is a specific term for that purpose (i.e. converting miniKanren's constraints to SMT assertions). Perhaps using "constraint conversion" or "constraint reification" is more precise?

michaelballantyne commented 3 years ago

Hmm. Maybe it would be useful to work through how you would add arithmetic operations to a relational interpreter without constraint promotion, but while ensuring refutation?

I think you'd at least be forced to tag the numbers, but I'm not certain that would be enough.

Because the relational interpreter supports many kinds of values, we don't know statically whether a given unification or disequality relates things that correspond to solver variables. For example, in the interpreter equal? is applicable to many types of argument: strings (which do not use the solver), numbers (where we'd like to use the solver), and lists (which may contain both strings and numbers).

michaelballantyne commented 3 years ago

(I'm also not sure on the best terminology; "constraint promotion" is a good shorthand for our communication, but I think I would avoid it in formal writing.)

chansey97 commented 3 years ago

For example, in the interpreter equal? is applicable to many types of argument: strings (which do not use the solver), numbers (where we'd like to use the solver), and lists (which may contain both strings and numbers).

@michaelballantyne I haven't tried the relational interpreter, but for a simple equalo, we can rewrite it by tags.

naive version

(define equalo
  (lambda (x y out)
    (conde
     ((== x y) (== #t out))
     ((=/= x y) (== #f out)))))

revised version

(define equalo
  (lambda (x y out)
    (fresh (t1 t2 v1 v2)
           (symbolo t1)
           (symbolo t2)
           (== `(,t1 ,v1) x)
           (== `(,t2 ,v2) y)
           (conde
            ((== t1 t2)  ; two tags are equal
             (conde
              ((== t1 'int)  ; tags are int
               (z/ `(declare-const ,v1 Int))
               (z/ `(declare-const ,v2 Int))
               (conde
                ((z/assert `(= ,v1 ,v2))
                 (== #t out))
                ((z/assert `(not (= ,v1 ,v2)))
                 (== #f out))) )
              ((=/= t1 'int)  ; tags are not int
               (conde
                ((== v1 v2) (== #t out))
                ((=/= v1 v2) (== #f out))))))
            ((=/= t1 t2)  ; two tags are not equal
             (== #f out))  
            ))))

;; We must use some tags to pack value
(run 5 (q)
     (fresh (out)
            (equalo '(int 1) '(int 1) out)
            (== q out)))
;; (#t)

(run 5 (q)
     (fresh (out)
            (equalo '(int 1) '(int 2) out)
            (== q out)))
'(#f)

(run 5 (q)
     (fresh (out)
            (equalo '(int 1) '(symb apple) out)
            (== q out)))
'(#f)

However, equalo is too easy to represent refutation.

I tried another example which can represent refutation:

all-diffo which ensures that all variables and values found within the flat list are different from each other.

naive version

(define all-diffo
  (lambda (l)
    (conde
     ((== l '()))
     ((fresh (x)
             (== l `(,x))))
     ((fresh (x1 x2 xs)
             (== l `(,x1 ,x2 . ,xs))
             (=/= x1 x2)
             (all-diffo `(,x1 . ,xs))
             (all-diffo `(,x2 . ,xs)))))))

;; refutation example:
(run* (q)
      (fresh (x y)
             (z/ `(declare-const ,x Int))
             (z/ `(declare-const ,y Int))
             (all-diffo `(apple ,x ,y))
             (z/assert `(= 0 (- ,x ,y)))))
;; divergence
;; because `==` and `=/=` can not be promoted to SMT assertions.

revised version

(define all-diffo
  (lambda (l)
    (conde
     ((== l '()))
     ((fresh (t v)
             (symbolo t)
             (== l `((,t ,v)))))
     ((fresh (t1 t2 v1 v2 xs)
             (symbolo t1)
             (symbolo t2)
             (== l `((,t1 ,v1) (,t2 ,v2) . ,xs))
             (conde
              ((== t1 t2)  ; two tags are equal
               (conde
                ((== t1 'int) ; tags are int
                 (z/ `(declare-const ,v1 Int))
                 (z/ `(declare-const ,v2 Int))
                 (z/assert `(not (= ,v1 ,v2))))
                ((=/= t1 'int) ; tags are not int
                 (=/= v1 v2))))
              ((=/= t1 t2)))  ; two tags are not equal
             (all-diffo `((,t1 ,v1) . ,xs))
             (all-diffo `((,t2 ,v2) . ,xs)))))))

;; refutation example:
(run* (q)
      (fresh (x y)
             (z/ `(declare-const ,x Int))
             (z/ `(declare-const ,y Int))
             (all-diffo `((symb apple) (int ,x) (int ,y)))
             (z/assert `(= 0 (- ,x ,y)))))
;; '()
;; We now have refutation!

I will try the relational interpreter later.

Note that these programs crashes in the current clpsmt-miniKanren, because there is a bug I have not fixed. If you'd like to try, you can temporarily uncomment this line.

namin commented 3 years ago

In the all-diffo example, could you reuse the definition of equalo?

I think it's good to experiment with different approaches. The tag approach seems a bit annoying -- think how you'd define factorial and do the packaging and re-packing of ints. Couldn't you use numbero instead of a tag?

chansey97 commented 3 years ago

In the all-diffo example, could you reuse the definition of equalo?

Yes.

(define all-diffo
  (lambda (l)
    (conde
     ((== l '()))
     ((fresh (t v)
             (symbolo t)
             (== l `((,t . ,v)))))
     ((fresh (x1 x2 xs)
             (== l `(,x1 ,x2 . ,xs))
             (equalo x1 x2 #f)  ; <--- just modify this line
             (all-diffo `(,x1 . ,xs))
             (all-diffo `(,x2 . ,xs)))))))

The tag approach seems a bit annoying -- think how you'd define factorial and do the packaging and re-packing of ints.

Yes, it is annoying.

For example,

(run 20 (q)
     (all-diffo `((cons (int 1) (cons (bool #t) (nil))) (bv3 bitvec-010) ,q))) ; cons, nil, bool, bv3 are tags.

(run* (q)
      (fresh (x y)
             (all-diffo `((symb apple) (int ,x) (int ,y)))
             (z/assert `(= 0 (- ,x ,y)))))

However, if we integrate the equalo (new version supports cons, nil, etc) into a relation interpreter, this annoying can be relieved. I mean that any constant literals (numbers, bools, ...) and constructors (e.g. cons, '()) now be interpreted as a value with a tag.

I have hacked the full-interp-extended.

;; factorial
(run* (q)
    (evalo `(letrec ((fac
                      (lambda (n)
                        (if (< n 0) #f
                            (if (= n 0) 1
                                (* n (fac (- n 1))))))))
              (fac 4))
           q))
;; '((int 24))

;; originally return '(24)
;; now return '((int 24))

;; refutation example:
(run 1 (q)
     (fresh (x y)
            (evalo `(and (not (equal? (cons ',x (cons 111 null))
                                      (cons 0 (cons 111 null)))) 
                         (not (equal? (cons ',y (cons 222 null))
                                      (cons 0 (cons 222 null))))
                         (= 0 (* ',x ',y)))
                   '(bool #t))
            (== q `(,x ,y))))

;; originally diverge
;; now return '()

;; no need to modify original program (e.g. manually tag the integers).

Currently I only tagged int and bool in evalo, but I think it should be able to extend to other types as well.

PS. I found @webyrd used a similar method for abstract-interp-tagged.

Couldn't you use numbero instead of a tag?

I don't know how to do.

As you see that the equalo uses a lot of types which only exists in SMT, while the miniKanren's type constraints only correspond to scheme types.

Another reason is that I don’t know how to construct negation in miniKanren, which make the code very ugly.

For example, how to represent not-numbero if no negation? May something like this:

(define not-numbero
  (lambda (x)
    (conde
     ((symbolo x))
     ((booleano x))
     ((pairo x))
     ((nullo x)))))

But using tags, we just write:

(x.tag =/= num)

PS. We can use conda to encode negation as failure, but that is impure operation.

chansey97 commented 3 years ago

did you find it educational to read the canonical implementation before diving into faster-miniKanren?

If it's just for hooking SMT, it doesn't seem no much difference (I now even think faster-miniKanren is much easier).

I explained 'not so important' in the other thread. I meant, it's not clear that doing the assumption strategy is an improvement.

Feeling that the smt-assumption relies on the incremental query implementation of the SMT solver.

Perhaps we need record and replay those SMT commands to check whether the incremental query is really effective? (Ignore the impact of miniKanren)

As another (not so related) experiment, we used only mk for search and z3 for solving and the performance is much worse than just mk: https://github.com/namin/mk-on-smt

I will look at it. Thanks.

BTW is it possible that the process communication is a bottleneck?

Making the repo more usable. (You're already doing this, and thinking of ways of improving it.)

The current implementation of call-z3 encodes/decodes SMT commands by creating files and replace text. It is very cumbersome and affects performance.

I'm thinking about how to improve it. For Racket, we can modify readtable, but for Scheme I don't know how to do. It seems that there is no readtable feature in Scheme? (Maybe implementation dependency?)

Are the test failures legitimate?

I am reading all the tests. I will report if I find some illegitimate case.

In fact, I have found some. I will create issues later.

chansey97 commented 3 years ago

What examples have been tried? Which are interesting?

I am reading the test cases these days. IMO, all the examples are interesting :)

Here are some examples that I think are most interesting:

  1. 24 puzzle

    This is the first example I think is very enlightening. Imagine that someone wants to create an indie game (24 puzzle for example). If he/she just generates some random 24 points cards or copy-pastes some ready-made examples from the Internet, the game will be very boring, because (1) there is no difficulty design (2) players can easily find the solution.

    However, if we use miniKanren to generates these game levels, the game will be interesting, because (1) we can control difficulty, e.g. assigning the frequency of the four arithmetic operations (2) it is hard for players to find solutions on the Internet.

    This is very useful for game design. BTW, I used to be a game programmer.

    PS. I also found another example about game design. It is more appealing than 24 puzzle, because it can automatically discover strategies. Maybe we can also translate it to miniKanren?

  2. Abstract interpretation and Tabling

    There are 2 kinds of abstract interpretations in the current repo.

    • abstract-interp.scm, abstract-interp-tagged.scm, full-abstract-interp-extended.scm
    • radi.scm, radif.scm, radiw.scm

    The 1st one is a partial abstract interpreter, the 2nd is a total abstract interpreter. That's great because now we get both an abstract interpreter and an abstract-based synthesizer.

    Unfortunately, neither of them uses tabling directly.

    I once heard @webyrd said that abstract interpretation has a deep connection with tabling. Could we run the partial abstract interpreter in the tabling miniKanren to get a total abstract interpreter out of the box (as long as program state is finite)? It seems that the caching and fixed-point finding algorithms in Abstracting Definitional Interpreters are very similar to the tabling algorithms in Relational Programming in miniKanren. Alas, the current tabling miniKanren cannot handle symbolic constraints.

    What do you think about it?

    PS. There is another method full-interp-extended-memo-lambda.scm, which seemly can also implement abstract interpreter (e.g. cut-off-recursion-1), but I am not sure.

  3. Types inference and inhabitants generation

    The current repo has some examples of type system (e.g. tapl.scm, recordsub.scm, rcd.scm), but they are very simple.

    I'm thinking about how difficult to add more advanced type systems, e.g. Hindley–Milner, System F, System Fsub, even Dependent Type, etc. Do we need more infrastructure support? (e.g. nominal logic)

    If we can do this, I think we will have achieved type-driven and test-driven synthesis in miniKanren. That will be very competitive. For example, Idris has a feature to generate terms from types, but it cannot work with programming by example. That makes the feature useless.

    It would be nice if we can just write types and test cases, miniKanren can automatically generate terms for us.

    ;; Take Pie as an example:
    (claim vec-append
     (Pi ((E U)
          (l Nat)
          (j Nat))
       (-> (Vec E l) (Vec E j)
           (Vec E (+ l j)))))
    (define vec-append
     ?) ; <---- automatically generate the term
    
    (test
      (vec-append Atom
                  3
                  2
                  (vec:: 'kanelbullar (vec:: 'plattar (vec:: 'prinsesstarta vecnil)))
                  (vec:: 'coffee (vec:: 'cocoa vecnil)))      
      (vec:: 'kanelbullar (vec:: 'plattar (vec:: 'prinsesstarta (vec:: 'coffee (vec:: 'cocoa vecnil))))))
  4. Music Theory and Reactive Programming

    I don't know the music theory, so I can't understand music.scm. But I believe it must be very interesting. It would be nice if you can recommend some materials that explain music.scm. BTW, in order to understand music.scm, I once tried to read some articles about music synthesis (e.g. this), but they are different things.

    I also can't understand reactive.rkt (maybe collision detection)? Could you explain the motivation of these examples?

    Thanks.