rlepigre / ocaml-bindlib

Efficient binder representation in OCaml
https://rlepigre.github.io/ocaml-bindlib/
GNU Lesser General Public License v3.0
39 stars 6 forks source link

Using Bindlib and Conchon & Filliâtre's Hashcons lib #37

Open grayswandyr opened 1 week ago

grayswandyr commented 1 week ago

Hi, thanks for this nice library. I am currently doing some testing, using a simple representation of first-order logic. My aim would be to combine the use of Bindlib with Conchon & Filliâtre's Hashcons library (article PDF), in particular using the Make functor. Alas I'm a but stuck, even after trying to get inspiration from your comments in #33 as well as your example. Would you have any hint to share? Thanks.

type term_node = Var of term Bindlib.var | App of Symbol.t * term list
and term = term_node H.hash_consed
and op = And
and t_node = Atom of Symbol.t * term list | Not of t | Bin of t * op * t
and t = t_node H.hash_consed
craff commented 1 week ago

David Chemouil @.***> writes:

Hello David,

I think you will have to use the functorial interface of hashcons library. The hashcons library does not know what a binder or a variable is and you have to build a code that can compute a hash which will not depend on the internal representation of a binder.

The example hashcons.ml provide an example computing a hash and the related equality, that should allow you produce modules of type HashedType and the use the functor Hashcons.Make.

Remark: there is no binder in your example, and it should work as you wrote it, but will fail whan you introduce binder.

Cheers, Christophe

Hi, thanks for this nice library. I am currently doing some testing, using a simple representation of first-order logic. My aim would be to combine the use of Bindlib with Conchon & Filliâtre's Hashcons library (article PDF), in particular using the Make functor. Alas I'm a but stuck, even after trying to get inspiration from your comments in #33 as well as your example. Would you have any hint to share? Thanks.

type term_node = Var of term Bindlib.var | App of Symbol.t term list and term = term_node H.hash_consed and op = And and t_node = Atom of Symbol.t term list | Not of t | Bin of t op t and t = t_node H.hash_consed

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

-- Christophe Raffalli web: https://raffalli.eu

craff commented 1 week ago

Hello again,

Please find attached the code porting the example of bindlib to the hascons library.

It uses recursive module and is a bit subtil. Especially to initialise the tbl as I want only one table otherwise the code is essentially wrong and all constructor must be given the table in argument. The subtil point it that we need hashconsing to compute the hash of binders.

Rodolphe: I guess you will read this, shall I push it ? It introduces a dependency on hashcons for one test and I don't know how modern opam handle this, but in the past it was an extra dependency for the library.

Cheers, Christophe

David Chemouil @.***> writes:

Hi, thanks for this nice library. I am currently doing some testing, using a simple representation of first-order logic. My aim would be to combine the use of Bindlib with Conchon & Filliâtre's Hashcons library (article PDF), in particular using the Make functor. Alas I'm a but stuck, even after trying to get inspiration from your comments in #33 as well as your example. Would you have any hint to share? Thanks.

type term_node = Var of term Bindlib.var | App of Symbol.t term list and term = term_node H.hash_consed and op = And and t_node = Atom of Symbol.t term list | Not of t | Bin of t op t and t = t_node H.hash_consed

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

-- Christophe Raffalli web: https://raffalli.eu

grayswandyr commented 1 week ago

Thank you so much, Christophe, for your prompt answer. Sorry for the missing binder, I had edited the code to make it simpler and removed the "for all" quantifier, here it is for the sake of completeness:

type term_node = Var of term Bindlib.var | App of Symbol.t * term list
and term = term_node H.hash_consed
and op = And
and t_node = Atom of Symbol.t * term list | Not of t | Bin of t * op * t | Forall of (term, t) binder
and t = t_node H.hash_consed

The code you submitted is indeed a bit tricky, I'll have to study it a little bit.

In the example, you're also using a constructor for hashed variables. Do you think it would be feasible to keep the original type as is, as Rodolphe had sketched in his answer to #33 ?

Do you see any advantage or drawback to one or the other approaches?

rlepigre commented 1 week ago

Rodolphe: I guess you will read this, shall I push it ? It introduces a dependency on hashcons for one test and I don't know how modern opam handle this, but in the past it was an extra dependency for the library.

Sure, I approved your PR. It should be fine for the dependency: it should only be installed if tests are requested.

rlepigre commented 1 week ago

In the example, you're also using a constructor for hashed variables. Do you think it would be feasible to keep the original type as is, as Rodolphe had sketched in his answer to #33 ?

I think it should be possible to use that approach indeed.

Do you see any advantage or drawback to one or the other approaches?

To me, the sole advantage of the approach I suggested (use a hash-table to store the hashes of bound variables) is that it avoids the need to add an extra and rather ad hoc constructor. A potential drawback is that additional insertions and lookups in a hash-table are required, so this might be less efficient (we should probably try both and benchmark).

craff commented 1 week ago

I see an avantage of the ad-hoc constructor other than speed: it is used only when computing hash and you will detect a bug easily if it leaks.

May be there is a way to use phantom type/gadt/polymorphic variant to capture this at typing... I will see.

Cheers Christine

Le 5 juillet 2024 13:37:21 GMT-10:00, Rodolphe Lepigre @.***> a écrit :

In the example, you're also using a constructor for hashed variables. Do you think it would be feasible to keep the original type as is, as Rodolphe had sketched in his answer to #33 ?

I think it should be possible to use that approach indeed.

Do you see any advantage or drawback to one or the other approaches?

To me, the sole advantage of the approach I suggested (use a hash-table to store the hashes of bound variables) is that it avoids the need to add an extra and rather ad hoc constructor. A potential drawback is that additional insertions and lookups in a hash-table are required, so this might be less efficient (we should probably try both and benchmark).

-- Reply to this email directly or view it on GitHub: https://github.com/rlepigre/ocaml-bindlib/issues/37#issuecomment-2211505826 You are receiving this because you commented.

Message ID: @.***>

grayswandyr commented 1 week ago

Hi, adding a special constructor is perhaps a bit harder to explain to other developers on the same project who don't deal with this part... so I'm trying to implement the "second" approach (i.e. without a supplemental constructor for hashed variables) from #33 to FOL, I hit a few issues, not completely solved. I'd be glad to get your feedback (and possibly error spotting), it might be useful to others too.

open Containers (* Simon Cruanes' extension to the Stdlib *)

module Fol = struct
  module B = Bindlib
  module HC = Hashcons

  (* a [Symbol.t] is a string with an efficient [equal] operation (implemented as a hash-consed string under the hood) *)

  (* terms *)
  type term_node = Var of term Bindlib.var | App of Symbol.t * term list
  and term = term_node HC.hash_consed
  and op = And | Or

  (* formulas *)
  and t_node =
    | Atom of Symbol.t * term list
    | Not of t
    | Bin of t * op * t
    | Forall of (term, t) B.binder

  and t = t_node HC.hash_consed

  module rec Hashed_t_node : (sig
    include Hashtbl.HashedType

    val hash_var : term B.var -> int
  end
  with type t = t_node) = struct
    type t = t_node

    module BoundVars = Hashtbl.Make (struct
      type t = term B.var

      let hash = B.hash_var
      let equal = B.eq_vars
    end)

    let bound_vars = BoundVars.create 251

    let next_var : unit -> int =
      let counter = ref (-1) in
      fun _ ->
        incr counter;
        !counter

    let hash_var x =
      (* Only bound variables have a stored hash. *)
      try BoundVars.find bound_vars x with Not_found -> Bindlib.hash_var x

    let hash = function
      | Atom (s, ts) ->
          (* the Container.Hash module implements combinators for hash computation; [poly] is [Hashtbl.hash] *)
          Hash.(triple poly Symbol.hash (list int))
            (`Atom, s, List.map (fun t -> t.HC.hkey) ts)
      | Not f -> Hash.(pair poly int) (`Not, f.HC.hkey)
      | Bin (f1, op, f2) ->
          Hash.(quad poly int poly int) (`Bin, f1.HC.hkey, op, f2.HC.hkey)
      | Forall f ->
          let x, subf = B.unbind f in
          BoundVars.add bound_vars x (next_var ());
          Hash.(pair poly int) (`Forall, subf.HC.hkey)

    (* ??? *)
    let equal_binders : (term, _) B.binder -> (term, _) B.binder -> bool =
      assert false

    let equal f1 f2 =
      match (f1, f2) with
      | Atom (s1, ts1), Atom (s2, ts2) ->
          (* Equal.physical is ( == ) *)
          Symbol.equal s1 s2 && Equal.(list physical) ts1 ts2
      | Not f1, Not f2 -> Equal.physical f1 f2
      | Bin (f11, op1, f12), Bin (f21, op2, f22) ->
          Equal.physical op1 op2 && Equal.physical f11 f21
          && Equal.physical f12 f22
      | Forall f1, Forall f2 -> equal_binders f1 f2
      | _ -> false
  end

  and Hashed_t_node_table : (HC.S with type key = t_node) =
    HC.Make (Hashed_t_node)

  and Hashed_term_node : (Hashtbl.HashedType with type t = term_node) = struct
    type t = term_node

    let hash = function
      | Var x -> Hash.(pair poly Hashed_t_node.hash_var) (`Var, x)
      | App (f, ts) ->
          Hash.(triple poly Symbol.hash (list int))
            (`App, f, List.map (fun t -> t.HC.hkey) ts)

    let equal t1 t2 =
      match (t1, t2) with
      | Var x1, Var x2 -> B.eq_vars x1 x2
      | App (f1, ts1), App (f2, ts2) ->
          Symbol.equal f1 f2 && Equal.(list physical) ts1 ts2
      | _ -> false
  end

  and Hashed_term_node_table : (HC.S with type key = term_node) =
    HC.Make (Hashed_term_node)
end
craff commented 1 week ago

Hello David,

Let me first explain a few things to make sure we understand each other. Also some of this view are a bit personnal and I think Rodolphe might give you a different viewpoint, which is nice.

The main interest of bindlib is fast substitution and removing the need to use a context if the programmer wants to. The substitution is O(S), where S is the size of the term using the binder with a good constant. Simple implementation like debruijn or names will be O(N ln(N)) or worse O(N^2) where N is the size of the whole term, the ln(N) of extra N factor being the time to search for variables in the context/environment. In practice, N is much bigger that S and most implementations try to get near to O(S), but this often introduce complexity everywhere in the code, while it is hidden with bindlib.

To really do this, my favourite solution is that each recursive function uses it own constructor for variables carrying just the needed information. For some function still using bindlib variables type might be enough (printing is one such case as Bindlib provides help for this). For instance for a theorem prover, I would introduce distinct constructors for existential variables and universal variables and maybe have a context that is only used for printing.

Using extensible variant is now a nice way to do this! Phantom type could also work ... but I failed to do that.

Regarding hashconsing, you won't benefit from hashconsing everywhere under binder. To be more precise, in a term (\lambda x. (x t)), t will be hashconsed only it does not use the variable x. As a consequence the complexity of hashconsing for binder is O(S) too (where S is still the size of the part of the term that uses the bound variables).

Still one may ask the question; is bindlib a good idea if one wants hashconsing. I think a general answer is yes, because in general, open terms need a context with some information about the variables and comparing terms in a context requires doing the substitution or traversing the term and more or less loosing hashconsing. But clearly, you need to do benchmark of several solutions or at least check that the complexity is reasonnable for your application, which can be done theoretically with what I just said.

I attached the adapted example with extensible variants.

adding a special constructor is perhaps a bit harder to explain to other developers on the same project who don't deal with this part... so I'm trying to implement the "second" approach (i.e. without a supplemental constructor for hashed variables) from #33 to FOL, I hit a few issues, not completely solved. I'd be glad to get your feedback (and possibly error spotting), it might be useful to others too.

Extensible variant solves this partially ... Still one need a default case in all function and the warning for missing cases become useless, which is annoying.

  • as far as I can see, 2 recursive modules must be used per "inner" type (i.e. t_node and term_node). An added difficulty is that bound variables (yielded by unbind) are added to the bound-var hashtable in the module for formulas (in the t_node hash function on case Forall), but the said table is accessed in the hash function on terms (that is, in another module). My solution was to make the hash_var function (which holds the said hashtable in its environment) accessible from outside.

You need a fresh environment holding the hash information on variables each time you compute the hash of a variable. I would pass this variable as parameter, and because you need recursive module, this is not very good. This is why I like to avoid environment.

  • For most equal cases, we want to rely on physical equality, as in C&F's article, but I suppose we can't use this for the Var nor for equal_binders (that I don't see how to write, actually).

I hope I answered this

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

-- Christophe Raffalli web: https://raffalli.eu

grayswandyr commented 1 week ago

I think I get your point about additional constructors and environments, now, thanks.

I attached the adapted example with extensible variants.

I wasn't able to find it...?

craff commented 1 week ago

David Chemouil @.***> writes:

I think I get your point about additional constructors and environments, now, thanks.

I attached the adapted example with extensible variants.

I wasn't able to find it...?

Hello... forgot it probably ...

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

-- Christophe Raffalli web: https://raffalli.eu